User traversals on hyperlinks between Web pages can reveal semantic relationships between these pages. We use user traversals on hyperlinks as weights to measure semantic relationships between Web pages. On the basis of these weights, we propose a novel method to put Web pages on a Web site onto different conceptual levels in a link hierarchy. We develop a clustering algorithm called PageCluster, which clusters conceptually-related pages on each conceptual level of the link hierarchy based on their in-link and out-link similarities. Clusters are then used to construct a conceptual link hierarchy, which is visualized in a prototype called Online Navigation Explorer (ONE) for adaptive Web site navigation. Our experiments show that our method can put Web pages onto conceptual levels of a link hierarchy more accurately than both the breadth-first search method and the shortest-weighted-path method, and PageCluster can cluster conceptually-related pages more accurately than the bibliographic analysis method. Our user study also shows that the conceptual link hierarchy visualized in ONE can help users find information more effectively and efficiently as the task of finding information becomes less specific and involves more Web pages on multiple conceptual levels.
Zhu, Jianhan ; Hong, Jun and Hughes, John G. (2004). PageCluster: mining conceptual link hierarchies from web log files for adaptive web site navigation. ACM Transactions on Internet Technology (TOIT), 4(2) pp. 185–208.