2008年6月2日 星期一

[Reading] Lecture 15 - The PageRank Citation Ranking : Bringing Order to the Web

This paper provides a method to evaluates the importance of a webpage according to its backlinks. Using the backlinks as the indication of goodness is not a new idea. In academic area, people see the number of citations, which can be viewed as the backlink of paper, to evaluate the importance of paper. The quality of the citation is under control (you must have the ability to publish a paper before you can cite), so it is worthy to believe that a paper has only a few citations won't be to important. However, it's different in the network. Building a website is easy, so does putting a forelink on the webpage. Linking by Yahoo is more valuable than linking by a personal webpage. Therefore, we should not only see the number of backlink. Intead, we should also consider the quality of the backlink provider.

After understanding the spirit of its method, you can easily know what it tries to do during the propagation. It makes an important webpage contributes more than a common webpage in each iteration. People in that era really likes to use propagation...

There may be some problems using the propagation approaches. For instance, there is a rank sink problem if there exists a local loop in the model. The author solve this problem by providing a source of rank. It can ensure that each page may contribute at least a little instead of zero. In a random walk of view, you can think that the source of rank is a random surfer. It will jump to a webpage that is not linked by the current webpage.

Indeed, PageRank provides a good way to evalute the importance according to the backlinks. However, I think it still ignores the forelinks. A webpage provides high-quality forelinks won't benefit much (although might be some if there exist a loop that includes the webpage and its forelinked-page) from its contribution.

Reference:
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, "The PageRank Citation Ranking: Bringing Order to the Web", Number SIDL-WP-1999-0120, November 1999.

沒有留言: