How does PageRank begin?
-
I am not asking a question about the history of PageRank. PageRank for any website is calculated by taking into account the links which point to it and the 'PageRank of those links'. But how was the PageRank calculated for those pages in the first place? What is the initial PageRank? How does the algorithm begin?
-
Answer:
Disclaimer: I work for Google on web search but the following is based on the publicly-available PageRank paper. This is a good question. A concise, "in English please" answer is that the PageRank algorithm can be viewed as an iterative algorithm. The algorithm begins at step one with some initial PageRank assigned to all pages. The algorithm is then applied iteratively until it arrives at a steady state; that is, until a PageRank has been distributed to all pages and a subsequent iteration of the algorithm provides little or no further change in the distribution of PageRank. The initial PageRank needs to be a function of the number of pages in the index; in the original PageRank paper it is 1/N for N pages in the index. This is the answer to your question: the PageRank of all pages is initially set to 1/N. Example: Assume our index contains four web pages; call them A, B, C, and D. PageRank is a probability, so we'll assign them each an initial probability of 0.25â100% probability divided by four, for four web pages in our index. Suppose each of B, C, and D linked to A. Let's focus only on page A. We'd start with, PR(A) = \frac{1}{N} = 0.25 Then, on the second iteration of the algorithm, we'd have, PR(A)= PR(B) + PR(C) + PR(D)\, Each of B, C, and D would transfer their PageRank to A, yielding, PR(A) = 0.25 + 0.25 + 0.25 = 0.75 Now let's consider a less trivial world. Assume page B linked to pages A and C. Page D linked to all three pages (A, B, and C). C isn't linking to A. Then, on the second iteration, we'd have: PR(A)= \frac{PR(B)}{2}+ \frac{PR(D)}{3} That is, B transfers half its existing PageRank to A, since it is linking to two pages (A and C). D transfers one-third its existing PageRank to A, since it is linking to three pages (A, B, and C). Thus, PR(A) = 0.125 + \frac{1}{12} = 0.20833... More generally, the PageRank of a page is equal to the summation of the PageRank of all the pages that link to it each divided by the number of outbound links on those pages. That is, if we define L(u) as the number of outbound links from u, then, PR(A)= \frac{PR(B)}{L(B)}+ \frac{PR(C)}{L(C)}+ \frac{PR(D)}{L(D)} Or, generalized, for any page v: PR(v) = \sum_{u \in O_v} \frac{PR(u)}{L(u)} Where O_v is the set of pages that link to v. Summary: The PageRank for a given page begins at some initial value. In the public PageRank paper, this value is 1/N for N pages in the index. What is important is that the value is (a) equal for all pages and (b) scales with the size of the index. That value is then transferred to other pages through successive iterations of the PageRank algorithm until a steady state is reached.
Robert Love at Quora Visit the source
Other answers
PageRank can start with ANY values. The algorithm will converge always to the principal eigenvector of the PageRank matrix and that doesn't depend on the initial values you give to each page.Since another answer says PageRank has to start with equal values for each page I'll show that is not needed.Let's say A is the PageRank matrix and x_0 is a random vector. I'll show that iterating x_0=A*x_0 we arrive to the same value regardless of x_0Since A is a column-stochastic matrix (all the columns sum 1) the larges eigenvalue is 1 and the 2nd eigenvalue is less than 1. Let's write x_0 as a linear combination of the eigenvectors of A: x_0 = \alpha_1 v_1 + ... + \alpha_n v_n Stochastic matrices can be diagonalized so:We have that A*x_0 = \alpha_1 \lambda_1 v_1 + ... + \alpha_n \lambda_n v_n We know that the first eigenvalue is 1 and it is larger than the second so repeating this process the first term dominates over all the others and after several iterations we have:A*x_0 = \alpha_1 \lambda_1 v_1 since v1 is an eigenvector then \alpha_1 \lambda_1 v_1 is also an eigenvector.This is known as the "power method" to find the largest absolute value eigenvalue a corresponding eigenvector of a matrix. In the pagerank case we know the largest eigenvalue is 1 because the matrix is column-stochastic.Conclusion: PageRank can start with a random intialization of weights and will eventually converge to the eigenvector asociated with the larges eigenvalue of the matrix which is 1 and is unique. We know a stationary distribution exists because the largest eigenvalue is 1 therefore Av_1=\lambda_1 v_1=v_1 this shows a vector exists such as Av=v which means the distribution is stationary.Notice that the sum of the PageRank vector will be constant in each iteration, if your initial vector sums 1 then the last vector can be treated as a probability distribution. If your initial vector sums 48 you can just divide each pagerank by 48 to arrive at the probability.Luis
Luis Argerich
Related Q & A:
- How do I begin a career in theatre?Best solution by atgtickets.com
- How do I begin finally making music on the piano?Best solution by Musical Practice & Performance
- How do I improve the Google Pagerank of my site?Best solution by Yahoo! Answers
- How to increase website pagerank?Best solution by Yahoo! Answers
- How do i begin a good autobiography?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.