Google PageRank Flaw

In the autumn of 2008 I interviewed for a front end design and development position at an internet shopping portal company in Tokyo.

It was an excruciating process to say the least.  They first had me accomplish a set of assignments.

These included…

  1. Implementing the Google PageRank algorithm as described in Larry Page and Sergey Brin’s paper The PageRank Citation Ranking: Bringing Order to the Web.
  2. Writing an essay outlining my personal thoughts on PageRank
  3. Writing a web spider that could crawl the internet both depth-first and breadth-first

I spent 3 weeks of my spare time getting it all just right and submitted my work for consideration.  Reasonably impressed with my work, they invited me for what would become a 5 hour long interview.

Towards the end I was told by my interviewers that my PageRank algorithm was nearly perfect but my essay was puzzling. (only at this point did I learn that the employees of this company were what you could call Google Fanboys Extraordinaire)

I was asked to clarify my essay’s point that PageRank is flawed and the discussion went along these lines…

Me: PageRank does not properly model a given page’s authoritativeness.

Interviewer: And, how, could, that, be?

Me: A link, in and of itself, is not a vote for a page’s authoritativeness.

Interviewer: Uh…. of course it is!  You read the PageRank paper we provided you right?  Let’s see, it says here you implemented the algorithm nearly perfectly.  This was your work right?  Explain to me again why you have a problem with PageRank.

Me: A link is merely a reference to another page, nothing more, nothing less.  It doesn’t capture enough information to call it a vote.

Interviewer: <Unconvinced, lets out a small chuckle>

Me: <Getting a bit impatient>  Alright, let me put these questions to you then.

Me: Is a link from Mothers Against Drunk Driving with the intent to draw attention to an offensive site condoning drunk driving a vote for its authoritativeness?  Is a link from a blogger who is against owning firearms to the NRA’s website a vote for its authoritativeness?  Is a link from a religious site against abortions to an abortion clinic a vote for its authoritativeness?

Interviewers: <Exchanging looks with one another, waiting for somebody to cut this awkward silence>

Some of these reasons against links being votes were explained in my essay but I suppose were glanced over.  I apparently destroyed the foundation upon which a few otherwise intelligent people had built their beliefs.  Needless to say I didn’t get the job and that is for the better.  I’m by no means a Google worshiper and most likely would not have fit in.  I only wish I hadn’t wasted 3 weeks to find this out about them.


Solving The Flaw

The only positive outcome of this interview process is perhaps this blog post.  If I can hold out PageRank’s flaw to people interested in a case study of how not to model software some good might come of it.

So to summarize: A link can be a vote but is not necessarily one.  The world’s most popular search engine has been improperly modeled around this incorrect view of the problem.

It warrants pointing out that all of this may no longer be true as Google is famous for frequently tweaking their algorithms.  But unless their current algorithm is smart enough to guess the linker’s intent, it still isn’t modeled right.  To do it right (and cheaply) you need to have the linker tell you the intent of the link.

Authoritativeness is subjective and that makes solving this problem difficult but if I were tasked with at least improving upon this problem I might propose a new HTML attribute for the anchor tag which would declare that intent.  Such an attribute would be taken into account by the search engines when judging the link.  It might look like this…

<a href="" link-intent="authoritative" />

The lack of an intent would be taken to mean that the reason is unknown and should not count as a vote for that page’s authoritativeness.  The only problem with this proposal is that web content developers would need a reasonable amount of time to get onboard with it.

But following the whims of search engine algorithms has never been a big issue for people interested in maintaining their pages’ SEO so it’s a rather small problem.



Comments are Disabled