Node ranking in a graph is an important technique underlying many real-world applications. Take web search for example. This task is essential for returning important (or highly ranked) websites that are relevant to a query. Because the entire Internet with all of its webpages and hyperlinks forms a graph, a technique that ranks nodes in a graph becomes essential for this application. Such a ranking technique would also be useful in identifying influential persons or opinion leaders in a large social network (e.g., the Facebook network).
Normally, a node-ranking task is solved in an unsupervised manner (i.e., without training examples) because it is difficult to ask humans to produce labels given a large amount of data. PageRank is likely the most famous unsupervised ranking approach that produces reasonably good results. The approach assumes that the ranking of a node is accumulated by the rankings of the nodes pointing to it (its predecessors). Furthermore, if a node points to many other nodes (many successors), its ranking should also be evenly distributed. Therefore, a node’s rank becomes the sum of the partial rankings of its predecessors, where the partial ranking is the ranking of a node divided by the number of its successors.
In the real world, however, graphs are usually incomplete, meaning that many connections can be missing. For example, not all of our real-world friendship connections are observed on FB. This deficit can cause problems for algorithms such as PageRank because they assume a complete graph exists. The figure below shows that the accuracy (i.e., AUC) of PageRank decreases as the drop rate of links increases. This phenomenon is the motivation behind our idea of performing link discovery along with an unsupervised ranking model. The goal of link discovery is to predict the missing links in a graph.
Figure 1. The accuracy (i.e., AUC) of PageRank decreases as the drop rate of links increases.
In developing AttriRank [1], we proposed an effective approach to solving the unsupervised ranking task by recovering missing links in a graph. In addition to the original graph obtained by PageRank, another artificial graph is created based on the similarity of attribute information between each pair of nodes. The similarity of node attributes is captured by the weight of the link between the pair of nodes. As a result, AttriRank can be viewed as simultaneously performing a random walk on two graphs. The first graph is the original graph obtained by PageRank, while the second graph is the artificial fully connected graph constructed by similarity of additional information.
Our experiments show that AttriRank significantly outperforms the state-of-the-art unsupervised ranking model.
Reference
1. Hsu, C., Lai, Y., Chen, W., Feng, M., and Lin, S. (2017). Unsupervised Ranking using Graph Structures and Node Attributes. WSDM 2017 Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 771-779.
Shou-De Lin
Professor, Department of Computer Science and Information Engineering