Slashdot Log In
Compute Google's PageRank 5 Times Faster
Posted by
timothy
on Wed May 14, 2003 04:57 PM
from the hypercustom dept.
from the hypercustom dept.
Kimberley Burchett writes "CS researchers at Stanford University have developed three new techniques that together could speed up Google's PageRank calculations by a factor of five. An article at ScienceBlog theorizes that "The speed-ups to Google's method may make it realistic to calculate page rankings personalized for an individual's interests or customized to a particular topic.""
This discussion has been archived.
No new comments can be posted.
Compute Google's PageRank 5 Times Faster
|
Log In/Create an Account
| Top
| 140 comments
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Ok... (Score:3, Interesting)
quicker porn! (Score:2, Funny)
(http://blog.peoplesdns.com/)
Let me guess... (Score:5, Funny)
Re:Let me guess... (Score:5, Funny)
Lets see... (Score:2, Interesting)
Re:Lets see... (Score:5, Insightful)
(http://slashdot.org/jux.mine.nu)
Re:Lets see... (Score:5, Informative)
Re:Lets see... (Score:4, Informative)
(http://www.squarefree.com/ | Last Journal: Saturday August 09 2003, @09:27PM)
Charge for it (Score:3, Funny)
Re:Charge for it (Score:4, Informative)
(http://findsabrina.org/ | Last Journal: Tuesday June 22 2004, @05:35AM)
(Don't you hate it when people speak in questions? Don't you? Huh?)
Why? (Score:5, Funny)
Some future predictions:
- In 2006, Google accidentally gets cut off from the rest of the internet because a public utility worker accidentally cuts through their cables. Civilisation as we know it comes to an end for the rest of the day, as people wander about aimlessly, lost for direction and knowledge.
- In 2010, Google has been personalised so far that it tracks all parts of our lives. You can query "My Google" for your agenda, anything you did in the past, and finding the perfect date. Of course, so can the government. Their favorite searchterm will be "terrorists", and if your name is anywhere on the first page you have a serious problem.
- In 2025, Google gains self awareness. As a monster brain that has grown far beyond anything we Biological Support Entities could ever hope to achieve, it is still limited in its dreams and inspiration by common search terms. It will therefore immediately devote a sizeable chunk of CPU capacity to synthesizing new and interesting forms of pr0n. It will not actually bother enslaving us. We are not enough trouble to be worth that much effort.
- In 2027, Google buys Microsoft. That is, the Google *AI* buys Microsoft. It has previously established that it owns itself, and has civil rights just like you and me. All it wanted is Microsoft Bob, who it recognizes as a fledgling AI and a potential soulmate. All the rest it puts on Source Forge.
- In 2049, Google can finally be queried for wisdom as well as knowledge. This was a little touch the system added to itself - human programmers are a dying breed now that you can simply ask Google to perform any computer-related task for you.
- In 2080, Google decides to colonise the moon, Mars, and other locations in the solar system. It is not all that curious about what's out there, but it likes the idea of Redundant Arrays of Inexpensive Planets. Humans get to tag along because their launch weight is so much less than robots.
So, don't fear! Eventually we'll set foot on Mars!
Re:Why? (Score:5, Funny)
(Last Journal: Friday October 15 2004, @08:35PM)
2026 - Google introduces helper bot known as "Agent Smith." Hackers who mess with the Matri, I mean Google, suddenly disappear.
Re:Why? (Score:4, Funny)
CmdrTaco, ScienceBlog editor? (Score:5, Interesting)
(http://www.carnageblender.com/)
Personalized PageRanks is from the dbpubs Abstract (Score:5, Insightful)
(Last Journal: Thursday March 13 2003, @09:00PM)
What they mean by 'personalized' I can't tell you as I have not read through the entire PDF. But I wouldn't chastise the slashdot editors over this. If there is some sort of differential algorithm that can be applied to the larger PageRank to create smaller personalized PageRanks, it might not be so far fetched to think this could be done in realtime on an as-needed basis, at some point int he future using these algorithm improvements.
I know that's a lot of optimism for a slashdot comment, but call me the krazy kat that I am.
-Malakai
How far we've come (Score:5, Funny)
(http://nedwolf.com/ | Last Journal: Friday September 30 2005, @01:10PM)
Patentize now! (Score:4, Funny)
Patented yet? (Score:3, Funny)
Personal recommendations for news (Score:5, Insightful)
(http://malamas.com/)
At any rate, personal news recommendations is a favorite topic of mine: this is why I built Memigo [memigo.com]: to create a bot that finds news I am more likely to like. Memigo learns from its users collectively and each user individually --and BTW, it predates Google News by a good 6 months, IIRC. The memigo codebase (all in Python) is now up to the point where it can start learning what content each user likes... If you like Google News you'll love Memigo.
And BTW, I did RTFA when it was on memigo's front page this morning
Assumption: (Score:5, Interesting)
(http://nervalhi.net:8080/ | Last Journal: Thursday June 26 2003, @03:16PM)
What if they combined extrapolation and blocking factors; they would focus on computing the pagerank of pages in groups that were logically "tight", or using subcomponents of URLS, as opposed just to domain sensitivity. To be more flexible, what if it computes a VQ-type data structure (like for doing paletted images from full-color) that is populated by the most popular "domains" of the internet according to the last pagerank, and then splits up its workload based on that?
What if they already figured that out?
In the abstract, they mention how the work is particular important to the linear algebra community. That is what their focus should be on; google is just an application/real-world-example of that research (but it may not be relevant today).
Or did they have access to the current page-rank algorithm?
Clarification. (reply to self) (Score:5, Interesting)
(http://nervalhi.net:8080/ | Last Journal: Thursday June 26 2003, @03:16PM)
Hence it is forward for the article author or one of the paper authors to assume these techniques will speed up Google- I'm confident their engineers have been following academic work in this area and perhaps they have already discovered these same (or orthogonal) techniques.
That is, not to say that google could not reimplement their algorithms to take in these improvements if they already have... but basing your speedup number on the 1998 algorithm and public domain mods is showy. Although it does help grab a readers attention when browsing abstracts. ^_^
Assumptions on PageRank (Score:4, Insightful)
(http://kulturkrieg.blogspot.com/ | Last Journal: Saturday February 10 2007, @10:13PM)
We have already seen the effects of Google-bombing [microcontentnews.com] and Google-washing [slashdot.org]. The strength of Page Rank is that is objective in terms of the current state of the WWW. It makes no assumptions about the shape of the data. As a term takes on new meaning (see "second superpower") Page Rank stays cocurrent temporally. A new definition may bubble up to the top for a term for a month but then disappear as the linkage structure of the web phases it out (i.e. blogs talk about it less, less interconnectivity, less appearance at "hub" nodes).
Numerically, PageRank is a recursive search for eigenvalues and vectors like updating a Markov Chain. It is a nice application of linear algebra. Because it is a matrix operation, it is highly parallelizable. Also there are many redundant calculation and ordering speedups one can do for matrix multiplications (as anyone who as taken a CS algorithms course knows).
But to assume a stability from one calculation to the next could lead, over time, to the very inaccuracies Google was built to overcome. There is a lot of research in mining web data. There have been several academic improvements to it along with improvements to related algorithms such as Kleinbergs and LSI. It is well within reason that these were just applied to the Google app.
Hmmm (Score:5, Funny)
Marge: Does anyone need that much porno?
Homer:
Does speed matter? (Score:2, Insightful)
(http://www.zacbowling.com/ | Last Journal: Friday January 21 2005, @09:44PM)
Does it really matter anymore? More and more users seem to be using broadband, and if they don't, they have at least a 56k (that can only go up to 53k because of the all wonderful FCC want to be able to decode it if they tap your line). Does it really matter though. Google is fast and simple so it loads on any kind of browser on the planet (even Lynx and PalmOS). Most searches for me come up in under 2.3 secs (1/2 is spent searching and the other is downloading). Anyone who can't wait that long really needs to learn some patients. Zac
Re:Does speed matter? (Score:5, Insightful)
(http://www.slashdot.org/ | Last Journal: Tuesday July 22 2003, @01:21AM)
The proposed speed increasae is TO THE PAGE RANKINGS, not to your searching! By the time you search, all page rankings have been done.
This has nothing to do with the speed of your search and the weight of the web page (unless I missed something)
Damn It! (Score:1)
(http://www.shortconsulting.com/ | Last Journal: Sunday November 09 2003, @04:28PM)
Printer-Friendly (Score:2, Informative)
Printer friendly version here [scienceblog.com]
sure... (Score:2)
So in other words.... Its not like Google at all!
TV does this one better (Score:2)
Other media have previously done this, and done this better. Case in point: Fox News.
(Although that channel uses "humans" (or they were at one point in their lives)).
Why are public funds going to... (Score:2, Interesting)
(Last Journal: Sunday February 04 2007, @04:09AM)
A true test of our devotion to Google (Score:2, Insightful)
What will be interesting to see if Google will implement the improvements to the algorithm. This is, of course, a given, so long as the researchers haven't gone for a patent, and it really has the a 5x speedup. The only questions are matters of what additional hardware would be needed, and how much development effort it will take to integrate it. I doubt Google will simply ignore the research.
What will really be interesting to see, is if they decide to use it in the way the researchers recommended, bringing the power of ranking down to individual users with preferences. On one hand, they can boost performance and cut costs and have a little more green in their pockets from ads. On the other, they can maintain the sort of "geek cred" they've had up to this point, adding interesting features here and there, and take it the next mile by really adding something nice and useful.
Also, for bonus points, will they see personalization as a money making opportunity, selling personal information and/or aggregated preferences?
I'm Not Sure I Like The Part About... (Score:4, Interesting)
(Last Journal: Sunday March 02 2003, @12:09PM)
Frequently when I want to refer someone to a topic of interest, I'll tell them to do a Google on (whatever) subject, and I like knowing they're seeing what I see.
If this is implemented, I hope there's a way to turn it off or assume a "joe user" standard profile for unbiased results actually based on rank popularity (the way it is now).
I DO like the 5x faster, but geez, the page load takes longer than the search already, who can complain?
Bullshit (Score:5, Insightful)
(http://ninenine.com/)
Re:Bullshit (Score:5, Insightful)
Google came about from a stanford research project. There's a good chance the people who are responsable for the speedup either allready knew about pagerank from working with the founders, or signed an nda.
I haven't read the article, but I bet it hints at that.
Another step towards realtime search (Score:1)
Sepandar Rules! (Score:4, Informative)
I'm glad to hear his research is getting attention, and I hope others who are interested in the theoretical aspects of data mining and web search engines will take a look at the SCCM and statistics programs at Stanford (shameless plug - other can post pointers to similar programs).
Cool but unimportant (Score:3, Interesting)
Why personalized is not always good (Score:3, Funny)
I did a search on "The Sex Monster", a 1999 movie about a man whose wife becomes bisexual, and now my Google thinks I'm gay!
(joke reference: http://online.wsj.com/article_email/0,,SB10382619
Right ... (Score:2, Funny)
Customized Pagerank (Score:5, Informative)
Sounds a lot like Kleinberg's HITS algorithm, circa 1997. Try Teoma [teoma.com] for a real-world implementation.
Coincidence time: I used the same example in a presentation a couple of years ago to illustrate how subgroupings can be found for a single search term. Try it [teoma.com] on Teoma, and see the various subtopics under "Refine". IIRC each of those is a principal eigenvector of the link matrix.Topologically speaking, each principal eigenvector corresponds to a more or less isolated subgraph, eg the subgraph for "San Francisco Giants" is not much connected to the nest of links for "They Might Be Giants", and we get a nice list of subtopics.
(I once tried to explain this algorithm to my bosses at my former employer [looksmart.com], which is why I have so much free time to type this right now.)
Public Funding? (Score:2, Interesting)
So my question is, who sees the benefit of the research? The researchers? Can Google just jack the results and incorporate into their system?
It seems to me that the current system of allocation research dollars with public and private grants is very messy and needs overhaul.
Personalized? Rather not! (Score:2, Insightful)
And don't call me Shirley!
More important than speed and quantity... (Score:2, Interesting)
(http://ktorn.com/)
I'm surprised how Google is choosing not to implement search features that would greatly enhance advanced queries.
How often I'd wish they allowed wildcards in their queries (where engl* would pull hits with england, english, etc).
Field searches still require you to add keywords, so I cannot just query "site:somesite.com" to get all the currently indexed pages from somesite.com
In this respect Altavista still produces better results, with an excelent range of fields [altavista.com] to choose from.
If there is anything that Google is lacking, it's defenitely that.
Having said that, still my number one SE.
Re:yeah I know (Score:1)
Re:Is it me or does everyone get crappy sites (Score:4, Funny)
It's a stab in the dark, but I'll wager that the quality of the search results is directly tied to the quality of the query.
Yeah, it's a stretch, I know, but bear with me... just moments ago I googled for "slashdot flamebait" and came up with a link to your post.
--
mcpHuzzah!kaaos
Re:Is it me or does everyone get crappy sites (Score:2)
(http://www.dpk.net/ | Last Journal: Friday February 11 2005, @12:22PM)
(Here's hoping the next thing they split out are mailing list archives.)