Slashdot is powered by your submissions, so send in your scoop

Compute Google's PageRank 5 Times Faster 140

Posted by timothy on Wednesday May 14, 2003 @05:57PM from the hypercustom dept.

Kimberley Burchett writes "CS researchers at Stanford University have developed three new techniques that together could speed up Google's PageRank calculations by a factor of five. An article at ScienceBlog theorizes that "The speed-ups to Google's method may make it realistic to calculate page rankings personalized for an individual's interests or customized to a particular topic.""

This discussion has been archived. No new comments can be posted.

Compute Google's PageRank 5 Times Faster

Load All Comments

Search 140 Comments Log In/Create an Account

Comments Filter:

Ok... (Score:3, Interesting)

by Anonymous Coward writes: on Wednesday May 14, 2003 @05:58PM (#5958802)

Who owns the software patent for this for the next 20 years?

Share
twitter facebook
- prior art (Score:2)
  
  by SHEENmaster ( 581283 ) writes:
  
  Google pagerank. Oh wait, prior art doesn't mean shit these days!
  
  Anyways, software patents seem to just be ignored these days. I can't remember the last time I paid Unisys for using a GIF...
  - Licensed under U.S. Patent 4,558,302 (Score:2, Informative)
    
    by yerricde ( 125198 ) writes:
    
    I can't remember the last time I paid Unisys for using a GIF...
    
    When was the last time you bought a copy of GraphicConverter, Fireworks, Photoshop, Paint Shop Pro, or any other program licensed under U.S. Patent 4,558,302 and foreign counterparts? The price of each of those programs includes a royalty paid to Unisys.
  - Re:prior art (Score:1)
    
    by Snarfy ( 27790 ) writes:
    
    So you're the one who will eventually put me out of a job? :-(
- Re:Ok... My ass does not suxa (Score:1)
  
  by willioo ( 673372 ) writes:
  
  if Microsoft still owns it... Say a puppet is lazy ?
quicker porn! (Score:2, Funny)

by joeldg ( 518249 ) writes:

damn.. this is good news ;)
Let me guess... (Score:5, Funny)

by RollingThunder ( 88952 ) writes: on Wednesday May 14, 2003 @05:59PM (#5958813)

Feeding the pigeons [google.com] amphetamines?

Share
twitter facebook
- Re:Let me guess... (Score:5, Funny)
  
  by irokitt ( 663593 ) writes: <archimandrites-iaur AT yahoo DOT com> on Wednesday May 14, 2003 @06:27PM (#5959066)
  
  No, They'll replace the pigeons with roadrunners.
  
  Parent Share
  twitter facebook
- They use pigeons? (Score:2)
  
  by ImaLamer ( 260199 ) writes:
  
  Could you imagine a ....
  
  By collecting flocks of pigeons in dense clusters, Google is able to process search queries at speeds superior to traditional search engines, which typically rely on birds of prey, brooding hens or slow-moving waterfowl to do their relevance rankings.
  
  5/14/03: The Day CBN Returned!
Lets see... (Score:2, Interesting)

by DanThe1Man ( 46872 ) writes:

What is 1/14th of a second divided by five?
- Re:Lets see... (Score:5, Insightful)
  
  by deadsaijinx* ( 637410 ) writes: <animemeken@hotmail.com> on Wednesday May 14, 2003 @06:02PM (#5958850) Homepage
  
  that's exactly what i thought. But, as google is a HUGE international organization, it makes loads of sense for them. That's 5x the traffic they can feed, even though you won't see a noticeable difference.
  
  Parent Share
  twitter facebook
  - Re:Lets see... (Score:2)
    
    by Chundra ( 189402 ) writes:
    
    Yes, they are a large organization. But guess what? The ranking isn't computed when you do a search. So, no, that's not 5X the traffic they can feed.
  - Re:Lets see... (Score:1)
    
    by Jedi Alec ( 258881 ) writes:
    
    me thinks that the bandwidth is more of an issue in this case than processing power, to be honest :/
  - Re:Lets see... (Score:3, Informative)
    
    by philipborlin ( 629841 ) writes:
    
    Didn't read the article did we? The page rank process is sped up 5x. All the pages are ranked ahead of time in a multi-day process so when you do your search you are searching against those pre-calculated ranks. What this technology will do is allow Google to rank their pages every day (instead of once every couple of days) or create more special interest sites ala groups, images, news, etc. with the extra processing power.
- Re:Lets see... (Score:5, Informative)
  
  by Anonymous Coward writes: on Wednesday May 14, 2003 @06:11PM (#5958943)
  
  RTA. PageRankings are computed in advance and take several days. A 5x increase in speed means specialized rankings could be computed.
  
  Parent Share
  twitter facebook
- Re:Lets see... (Score:2)
  
  by Yokaze ( 70883 ) writes:
  
  > What is 1/14th of a second divided by five?
  
  I'd say, roughly 4000 computers in a cluster at work.
- - Re:Lets see... (Score:4, Informative)
    
    by jesser ( 77961 ) writes: on Wednesday May 14, 2003 @06:40PM (#5959188) Homepage Journal
    
    Google Search doesn't show hits exactly in the order of page rank. Relevance and other factors also affect order. My biggest page (the one that is my Slashdot URL) is PR7, but there are words on the page for which a lower-rank page beats me, because they're more relevant for that word. Relevance includes how many times the word appears on the page, the HTML context in which it is used, whether pages that link link using the search terms, and the order and nearness of the words in a multi-word search without quotes.
    
    Parent Share
    twitter facebook
- Re:Lets see... (Score:2)
  
  by caluml ( 551744 ) writes:
  
  I've often wondered how their searches can be so quick. Put a random string in your page somewhere, and when it gets into Google, search for it. There's no way they could have pre-executed the query, and as it only exists on one page, they've had to search their entire databases when you click search.
  I really have no idea how they can do this. I suspect it's some form of magic.
- Re:Lets see... (Score:1)
  
  by TaranRampersad ( 650261 ) writes:
  
  1/70th of a second. The difference is 4/70ths of a second.
Charge for it (Score:3, Funny)

by Anonymous Coward writes: on Wednesday May 14, 2003 @06:00PM (#5958823)

Don't give it away to Google - charge them or let them buy the new method.

Share
twitter facebook
- Re:Charge for it (Score:4, Informative)
  
  by ahaning ( 108463 ) writes: on Wednesday May 14, 2003 @06:23PM (#5959036) Homepage Journal
  
  But, didn't Google originate out of Stanford? Isn't it reasonable to think that the two are still pretty friendly?
  
  (Don't you hate it when people speak in questions? Don't you? Huh?)
  
  Parent Share
  twitter facebook
  - Re:Charge for it (Score:3, Informative)
    
    by pldms ( 136522 ) writes:
    
    But, didn't Google originate out of Stanford?
    
    Yep [google.com]. Originally called Backrub, curiously.
  - Re:Charge for it (Score:2)
    
    by Dr. Spork ( 142693 ) writes:
    
    And that might be the reason why the researchers didn't just sell this algo to one of Google's competitors before making any announcement at all. Don't you think MSN would are drooling about the possibility of getting their slimy hands on it? (Just wait--they still might...)
- Why? (Score:5, Funny)
  
  by johannesg ( 664142 ) writes: on Wednesday May 14, 2003 @06:34PM (#5959131)
  
  Why, actually? Google is a free service, isn't it? And it is becoming more and more a normal part of many people's lifes. Coupled with an always on connection it has certainly become an extension of my own brain.
  Some future predictions:
  - In 2006, Google accidentally gets cut off from the rest of the internet because a public utility worker accidentally cuts through their cables. Civilisation as we know it comes to an end for the rest of the day, as people wander about aimlessly, lost for direction and knowledge.
  - In 2010, Google has been personalised so far that it tracks all parts of our lives. You can query "My Google" for your agenda, anything you did in the past, and finding the perfect date. Of course, so can the government. Their favorite searchterm will be "terrorists", and if your name is anywhere on the first page you have a serious problem.
  - In 2025, Google gains self awareness. As a monster brain that has grown far beyond anything we Biological Support Entities could ever hope to achieve, it is still limited in its dreams and inspiration by common search terms. It will therefore immediately devote a sizeable chunk of CPU capacity to synthesizing new and interesting forms of pr0n. It will not actually bother enslaving us. We are not enough trouble to be worth that much effort.
  - In 2027, Google buys Microsoft. That is, the Google *AI* buys Microsoft. It has previously established that it owns itself, and has civil rights just like you and me. All it wanted is Microsoft Bob, who it recognizes as a fledgling AI and a potential soulmate. All the rest it puts on Source Forge.
  - In 2049, Google can finally be queried for wisdom as well as knowledge. This was a little touch the system added to itself - human programmers are a dying breed now that you can simply ask Google to perform any computer-related task for you.
  - In 2080, Google decides to colonise the moon, Mars, and other locations in the solar system. It is not all that curious about what's out there, but it likes the idea of Redundant Arrays of Inexpensive Planets. Humans get to tag along because their launch weight is so much less than robots.
  So, don't fear! Eventually we'll set foot on Mars!
  
  Parent Share
  twitter facebook
  - Re:Why? (Score:1)
    
    by Lord Kestrel ( 91395 ) writes:
    
    Well done, this is quite a bit more amusing than the normal hot grits first posts.
  - Re:Why? (Score:5, Funny)
    
    by JDWTopGuy ( 209256 ) writes: on Wednesday May 14, 2003 @06:39PM (#5959169) Homepage Journal
    
    You missed a step:
    
    2026 - Google introduces helper bot known as "Agent Smith." Hackers who mess with the Matri, I mean Google, suddenly disappear.
    
    Parent Share
    twitter facebook
  - Re:Why? (Score:4, Funny)
    
    by tricknology ( 112298 ) writes: <lee@NospAm.horizen.net> on Wednesday May 14, 2003 @06:40PM (#5959184)
    
    In 2101, war was beginning.
    
    Parent Share
    twitter facebook
  - Re:Why? (Score:2)
    
    by ewhenn ( 647989 ) writes:
    
    Redundant Arrays of Inexpensive Planets
    
    You can tell google was a human design, it wants to RAIP (pronounce it as it is spelled) other planets.
  - Re:Why? (Score:2)
    
    by RickHunter ( 103108 ) writes:
    
    You missed the last step:
    
    2030 - Google-AI develops quantum technology. Now you can not only query it to see what you did before, but what you WILL do up to a week from now. Or rather, what you would have done had you not seen your schedule. Google-AI provides no garuntees about what those forewarned of their schedule will do.
  - Re:Why? (Score:2, Funny)
    
    by Eristone ( 146133 ) * writes:
    
    Forgot this one...
    
    - In 2050, The Internet Oracle [indiana.edu] (formerly the Usenet Oracle) wins a landslide lawsuit against Google for patent violation, infringement and using Zadoc without a license. The Internet Oracle licenses Zadoc to Google and as part of the settlement, Google is now responsible for answering all woodchuck-related queries.
    
    "In a 32 bit world, you're a 2 bit user." -- All About the Pentiums by Weird Al
  - Re:Why? (Score:2)
    
    by cpeterso ( 19082 ) writes:
    
    Google gains self awareness.
    
    Google already scares me a little. If you look at Google Labs [google.com], their Google Sets and WebQuotes already show simple "knowledge" of real world items.
    
    Most AI research projects (like Cyc [opencyc.org]) face is a huge problem: data entry. All facts and rules must be manually entered by human operators. What if you could connect an Cyc-like AI frontend to Google's world-knowledge backend? Sure, much of the Internet is porn, spam, scams, banner ads, and lies, but Google already relies on PageRank
  - Google is a free service, isn't it? no (Score:2)
    
    by DrSkwid ( 118965 ) writes:
    
    Internet users doing searches may be free but google has plenty of paying customers.
    
    They provide an excellent service for their paid advetisements and represent great value for money.
- Re:Charge for it (Score:1)
  
  by yanestra ( 526590 ) * writes:
  
  Don't give it away to Google - charge them or let them buy the new method.
  
  Bravo! That's true American spirit!
CmdrTaco, ScienceBlog editor? (Score:5, Interesting)

by jbellis ( 142590 ) * writes: <jonathan@caERDOS ... m minus math_god> on Wednesday May 14, 2003 @06:00PM (#5958826) Homepage

A 5 times speedup is still many orders of magnitude too slow to personalize terabytes of data for millions of customers. That's just ludicrous. But somehow Science Blog puts "...may make it realistic to calculate page rankings personalized for an individual's interests" in their abstract when the actual article from National Science Foundation says nothing of the sort:

Computing PageRank, the ranking algorithm behind the Google search engine, for a billion Web pages can take several days. Google currently ranks and searches 3 billion Web pages. Each personalized or topic-sensitive ranking would require a separate multi-day computation, but the payoff would be less time spent wading through irrelevant search results. For example, searching a sports-specific Google site for "Giants" would give more importance to pages about the New York or San Francisco Giants and less importance to pages about Jack and the Beanstalk.

...
The complexities of a personalized ranking would require [far] greater speed-ups to the PageRank calculations. In addition, while a faster algorithm shortens computation time, the issue of storage remains. Because the results from a single PageRank computation on a few billion Web pages require several gigabytes of storage, saving a personalized PageRank for many individuals would rapidly consume vast amounts of storage. Saving a limited number of topic-specific PageRank calculations would be more practical.

Clearly the ScienceBlog and /. editors share more than a work ethic, or, uh, lack thereof. Next up: CmdrTaco's secret double life revealed!

Share
twitter facebook
- Personalized PageRanks is from the dbpubs Abstract (Score:5, Insightful)
  
  by malakai ( 136531 ) * writes: on Wednesday May 14, 2003 @06:09PM (#5958919) Journal
  
  I have no idea what the hell they are talking about, but even I read this in one of the abstracts:
  
  The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3-stage algorithm whereby (1)~the local PageRanks of pages for each host are computed independently using the link structure of that host, (2)~these local PageRanks are then weighted by the ``importance'' of the corresponding host, and (3)~the standard PageRank algorithm is then run using as its starting vector the weighted aggregate of the local PageRanks. Empirically, this algorithm speeds up the computation of PageRank by a factor of 2 in realistic scenarios. Further,
  
  we develop a variant of this algorithm that efficiently computes many different ``personalized'' PageRanks, and a variant that efficiently recomputes PageRank after node updates.
  
  What they mean by 'personalized' I can't tell you as I have not read through the entire PDF. But I wouldn't chastise the slashdot editors over this. If there is some sort of differential algorithm that can be applied to the larger PageRank to create smaller personalized PageRanks, it might not be so far fetched to think this could be done in realtime on an as-needed basis, at some point int he future using these algorithm improvements.
  
  I know that's a lot of optimism for a slashdot comment, but call me the krazy kat that I am.
  
  -Malakai
  
  Parent Share
  twitter facebook
  - Re:Personalized PageRanks is from the dbpubs Abstr (Score:2)
    
    by donutello ( 88309 ) writes:
    
    "Personalized PageRank" is a bad term to use for what the researchers are describing. Essentially what they mean is categorized pagerank i.e. being able to rank a particular page differently based on the category which was being searched under. What this algorithm would allow you to do is to add more categories.
    
    Bottomline: These researchers did some cool stuff to speed up the algorithm published in 1998 and how are trying to justify a use for it.
    - Exactly! (Score:1)
      
      by pen ( 7191 ) writes:
      
      And that is why the word personalized has quotation marks around it.
- Re:CmdrTaco, ScienceBlog editor? (Score:3, Insightful)
  
  by bergeron76 ( 176351 ) * writes:
  
  Right, but couldn't people be stereotyped? This could be an abstraction of "individualized".
  - Re:CmdrTaco, ScienceBlog editor? (Score:3, Funny)
    
    by Bingo Foo ( 179380 ) writes:
    
    No, but they can call it "Multividualized"® and prevent you from calling it that without paying a royalty.
- Re:CmdrTaco, ScienceBlog editor? (Score:2)
  
  by seanadams.com ( 463190 ) * writes:
  
  A 5 times speedup is still many orders of magnitude too slow to personalize terabytes of data for millions of customers.
  
  That's assuming every one of those millions of individuals has very diverse preferences.
  
  I doubt if there are more than a dozen or so useful ways to customize pagerank - we're talking about how the various link structures are weighted, not specific content. Any further "personalization" could just be done by filtering (and perhaps merging) smaller sets of search results.
How far we've come (Score:5, Funny)

by L. VeGas ( 580015 ) writes: on Wednesday May 14, 2003 @06:02PM (#5958852) Homepage Journal

I remember in 1970, it took a team of engineers over 7 days to calculate Google's page rankings. Of course, most had to use slide rules because computer time was so expensive.

Share
twitter facebook
- Re:How far we've come (Score:2)
  
  by deadsaijinx* ( 637410 ) writes:
  
  for a second there I took you seriously ... I need to lay off the hashish.
- Re:How far we've come (Score:2)
  
  by weston ( 16146 ) writes:
  
  I remember in 1970, it took a team of engineers over 7 days to calculate Google's page rankings. Of course, most had to use slide rules because computer time was so expensive.
  
  Fortunately, the search space was much shallower then -- fewer nodes and fewer connections.
  - Re:How far we've come (Score:3, Funny)
    
    by Cuthalion ( 65550 ) writes:
    
    Your search - "world wide web" - did not match any documents. 0 pages searched in 6.31 seconds.
Patentize now! (Score:4, Funny)

by Anonymous Coward writes: on Wednesday May 14, 2003 @06:06PM (#5958887)

I hope guys at Stanford patentize their work to protect it from FS/OSS looters. It's time to get something back from the FS/OSS community -not just that their zealotry and lust for IP violations, freeriding yada yada...

Share
twitter facebook
Patented yet? (Score:3, Funny)

by Anonymous Coward writes: on Wednesday May 14, 2003 @06:06PM (#5958888)

Oi! Bezos! NO!!!

Share
twitter facebook
Personal recommendations for news (Score:5, Insightful)

by costas ( 38724 ) writes: on Wednesday May 14, 2003 @06:10PM (#5958930) Homepage

In my view, personal recommendations from a search engine are mostly valuable for topical content --i.e. news items. However, the optimizations from these papers don't sound to me like they can do much for this case --news items pop up in a news site, and re-indexing the news source itself (say, the front page of CNN) won't tell you much about a particular CNN story.

At any rate, personal news recommendations is a favorite topic of mine: this is why I built Memigo [memigo.com]: to create a bot that finds news I am more likely to like. Memigo learns from its users collectively and each user individually --and BTW, it predates Google News by a good 6 months, IIRC. The memigo codebase (all in Python) is now up to the point where it can start learning what content each user likes... If you like Google News you'll love Memigo.

And BTW, I did RTFA when it was on memigo's front page this morning :-)...

Share
twitter facebook
- Nobody needs this (Score:1)
  
  by keller ( 267973 ) writes:
  
  That is what /. is for. Only source for news needed, cause it's all the "Stuff that matters" (and News for nerds at the same time).
  You are sure that everything here is of interest, and nothing is redundant, out of date, boring or stupid!
Assumption: (Score:5, Interesting)

by moogla ( 118134 ) writes: on Wednesday May 14, 2003 @06:11PM (#5958938) Homepage Journal

That google hasn't already implemented something akin to quadratic extrapolation, or some orthogonal optimization technique. Google has come a long way since the published page rank papers 4 years back.

What if they combined extrapolation and blocking factors; they would focus on computing the pagerank of pages in groups that were logically "tight", or using subcomponents of URLS, as opposed just to domain sensitivity. To be more flexible, what if it computes a VQ-type data structure (like for doing paletted images from full-color) that is populated by the most popular "domains" of the internet according to the last pagerank, and then splits up its workload based on that?

What if they already figured that out?

In the abstract, they mention how the work is particular important to the linear algebra community. That is what their focus should be on; google is just an application/real-world-example of that research (but it may not be relevant today).

Or did they have access to the current page-rank algorithm?

Share
twitter facebook
- Clarification. (reply to self) (Score:5, Interesting)
  
  by moogla ( 118134 ) writes: on Wednesday May 14, 2003 @06:18PM (#5958998) Homepage Journal
  
  According to the document, they reference the original 1998 paper on PageRank. I see a number of other references about improvements to the algorithm, but nothing specific to Google's own implementation. The paper mentions how the improvements help, but not if Google uses them.
  
  Hence it is forward for the article author or one of the paper authors to assume these techniques will speed up Google- I'm confident their engineers have been following academic work in this area and perhaps they have already discovered these same (or orthogonal) techniques.
  
  That is, not to say that google could not reimplement their algorithms to take in these improvements if they already have... but basing your speedup number on the 1998 algorithm and public domain mods is showy. Although it does help grab a readers attention when browsing abstracts. ^_^
  
  Parent Share
  twitter facebook
- Assumptions on PageRank (Score:4, Insightful)
  
  by sielwolf ( 246764 ) writes: on Wednesday May 14, 2003 @06:28PM (#5959077) Homepage Journal
  
  I feel your assumption is wrong. It would be foolish to assume that the eigenvectors and eigenvalues they derive from one Pagerank will generally hold in a space as dynamic as the worldwide web. Sure, slashdot.org will probably maintain the same sort of authority and hub value... but what as terms change? A flurry of "blog" articles one month may make /. an authority... but what when the infatuation ends?
  
  We have already seen the effects of Google-bombing [microcontentnews.com] and Google-washing [slashdot.org]. The strength of Page Rank is that is objective in terms of the current state of the WWW. It makes no assumptions about the shape of the data. As a term takes on new meaning (see "second superpower") Page Rank stays cocurrent temporally. A new definition may bubble up to the top for a term for a month but then disappear as the linkage structure of the web phases it out (i.e. blogs talk about it less, less interconnectivity, less appearance at "hub" nodes).
  
  Numerically, PageRank is a recursive search for eigenvalues and vectors like updating a Markov Chain. It is a nice application of linear algebra. Because it is a matrix operation, it is highly parallelizable. Also there are many redundant calculation and ordering speedups one can do for matrix multiplications (as anyone who as taken a CS algorithms course knows).
  
  But to assume a stability from one calculation to the next could lead, over time, to the very inaccuracies Google was built to overcome. There is a lot of research in mining web data. There have been several academic improvements to it along with improvements to related algorithms such as Kleinbergs and LSI. It is well within reason that these were just applied to the Google app.
  
  Parent Share
  twitter facebook
  - I didn't state my point clearly. (Score:2)
    
    by moogla ( 118134 ) writes:
    
    The assumption I thought they were making is that Google hasn't improved on page-rank since 1998, which is what they based their comparison (25-300% speedup) upon.
    
    I further speculated google may have already discovered some of these techniques independantly, perhaps by reading the same papers these students did.
    
    The other stuff was a pie-in-the-sky idea of mine that I thought was a way of combining both techniques, which I suspected google may have used part of. But that's just my opinion, I'm probably wro
- Re:yeah I know (Score:1)
  
  by irokitt ( 663593 ) writes:
  
  I think the point is that they can radically change the formatting of your search results. The problem with this though is that it threatens the very reason most of us use Google-it's simplicity. Can you imagine google turning into *gasp* Yahoo?
Hmmm (Score:5, Funny)

by Linguica ( 144978 ) writes: on Wednesday May 14, 2003 @06:12PM (#5958954)

Geek: I invented a program that downloads porn off the internet one million times faster. Marge: Does anyone need that much porno? Homer: :drools: One million times...

Share
twitter facebook
Does speed matter? (Score:2, Insightful)

by zbowling ( 597617 ) * writes:

I remember when Yahoo.com flauted all of the place how it would load in under 3 secs on a 28.8 modem. Now you visit them and you get big images, flash, java, and other massive bandwidth eatters.

Does it really matter anymore? More and more users seem to be using broadband, and if they don't, they have at least a 56k (that can only go up to 53k because of the all wonderful FCC want to be able to decode it if they tap your line). Does it really matter though. Google is fast and simple so it loads on any kin
- Re:Does speed matter? (Score:1)
  
  by jat850 ( 589750 ) writes:
  
  The article states, however, that the PageRank calculation optimizations would not improve search times for end-users of the search engines. They simply improve the calculation of PageRank information.
  
  A PageRank calculation does not take place on every single search, it is a periodic backend function, is my understanding.
  - Re:Does speed matter? (Score:1)
    
    by irokitt ( 663593 ) writes:
    
    Yah, that's why its really an index, not a search engine.
    But the thought is nice.
- Re:Does speed matter? (Score:5, Insightful)
  
  by Slurpee ( 4012 ) writes: on Wednesday May 14, 2003 @06:37PM (#5959157) Homepage Journal
  
  I'm sorry, but haven't you totally missed the point of the article?
  
  The proposed speed increasae is TO THE PAGE RANKINGS, not to your searching! By the time you search, all page rankings have been done.
  
  This has nothing to do with the speed of your search and the weight of the web page (unless I missed something)
  
  Parent Share
  twitter facebook
- Re:Does speed matter? (Score:2)
  
  by costas ( 38724 ) writes:
  
  You got it upside down; this is about building the *index* faster, not serving pages. Google AFAIK updates their index at the first of its month, so we can only assume it takes =30 days to build.
- Re:Does speed matter? (Score:3, Interesting)
  
  by DavidMonks ( 565935 ) writes:
  
  Um. This is about speed of *calculation* of PageRank, not speed of delivering the calculated result to you.
  
  The articles and earlier postings explain this a little more fully. Anyone who can't take the time to read them really needs to learn some patience :)
  
  PageRanks are periodically calculated for the Web as a whole. The results are stored and served to users. (The periodic update is sometimes referred to as the GoogleDance.) PRs are not calculated on the fly.
  
  Hence, a speed increase could reduce Goo
- Re:Does speed matter? (Score:2)
  
  by shird ( 566377 ) writes:
  
  Besides not getting the point of the article, yes speed does matter. Consider the number of searches Google does a day, multiply that by the amount of time it takes to do a search. I can't say for sure what the number is, but I would be safe to say its many many computer+man hours of wasted time. For an individiual it may not seem like much, but multiplied by the population of the internet, many times a day, you start to run into many wasted hours. If they can half the time it takes to do a search, they dou
- Re:Does speed matter? (Score:1)
  
  by kesuki ( 321456 ) writes:
  
  Anyone who can't wait that long really needs to learn some patients.
  Are we gonna learn some slashdotters too?
  patience can be learned, but patients are the kind of people tho make websites like this one [patients-association.com]
  Someone had to be pedantic...
Damn It! (Score:1)

by oaf357 ( 661305 ) writes:

Just when I was starting to go one direction with my theories on Google PR the game gets switched. I thought I was going to have the upper hand for once. Oh well. It would be nice to see this happen as a true user service.
Printer-Friendly (Score:2, Informative)

by g00set ( 559637 ) writes:

Printer friendly version here [scienceblog.com]
sure... (Score:2)

by mschoolbus ( 627182 ) writes:

"The speed-ups to Google's method may make it realistic to calculate page rankings personalized for an individual's interests or customized to a particular topic."

So in other words.... Its not like Google at all!
TV does this one better (Score:2)

by writertype ( 541679 ) writes:

"personalized for an individual's interests or customized to a particular topic."
Other media have previously done this, and done this better. Case in point: Fox News.
(Although that channel uses "humans" (or they were at one point in their lives)).
- Fox News not exactly personalized (Score:1)
  
  by jvalenzu ( 96614 ) writes:
  
  Fox News is only personalized for reactionaries. CNBC also developed that specialization.
Why are public funds going to... (Score:2, Interesting)

by dsanfte ( 443781 ) writes:

Why are a public university's funds and time being used to benefit a private company? Last I checked, Google isn't a charity. Doesn't Google have its own programmers? Wouldn't these "CS Researchers'" time be better spent furthering science instead of being free labor for corporations, at the expense of taxpayers?
A true test of our devotion to Google (Score:2, Insightful)

by SlashdotMirrorer ( 669639 ) writes:

What will be interesting to see if Google will implement the improvements to the algorithm. This is, of course, a given, so long as the researchers haven't gone for a patent, and it really has the a 5x speedup. The only questions are matters of what additional hardware would be needed, and how much development effort it will take to integrate it. I doubt Google will simply ignore the research.
What will really be interesting to see, is if they decide to use it in the way the researchers recommended, bri
- Re:A true test of our devotion to Google (Score:1)
  
  by doktor-hladnjak ( 650513 ) writes:
  
  What will be interesting to see if Google will implement the improvements to the algorithm. This is, of course, a given, so long as the researchers haven't gone for a patent, and it really has the a 5x speedup. The only questions are matters of what additional hardware would be needed, and how much development effort it will take to integrate it. I doubt Google will simply ignore the research.
  Personally, I'm somewhat curious of how relevant this may even be to Google at all. As far as I recall, Google h
I'm Not Sure I Like The Part About... (Score:4, Interesting)

by Mister Transistor ( 259842 ) writes: on Wednesday May 14, 2003 @06:52PM (#5959282) Journal

The bit about customized rankings based on user profiling of some type.

Frequently when I want to refer someone to a topic of interest, I'll tell them to do a Google on (whatever) subject, and I like knowing they're seeing what I see.

If this is implemented, I hope there's a way to turn it off or assume a "joe user" standard profile for unbiased results actually based on rank popularity (the way it is now).

I DO like the 5x faster, but geez, the page load takes longer than the search already, who can complain?

Share
twitter facebook
- Re:I'm Not Sure I Like The Part About... (Score:3, Interesting)
  
  by NerveGas ( 168686 ) writes:
  
  GOOGLE can complain. By making it five times faster, they can spend:
  
  -five times less on servers
  -five times less on power for the servers
  -five times less on data center real estate
  -five times less on cooling the data center
  -five times less on replacing dead hardware
  -much less on paying people to maintain the machines
  
  The list doesn't stop there, either. The costs involved with running a high-traffic web site are very significant.
  
  steve
  - Mod Parent Up, Please (Score:2)
    
    by billstewart ( 78916 ) writes:
    
    Yup. You may already see the page fast enough, but that's *using* pagerank - Calculating pagerank is a separate process, and if they can do it five times faster, they can either spend less money calculating it, or calculate it more frequently so it stays more current.
  - Re:I'm Not Sure I Like The Part About... (Score:3, Interesting)
    
    by forged ( 206127 ) writes:
    
    Actually Google doesn't replace dead hardware in their datacenters. It just stays there... (couldn't believe it myself when I read that).
Bullshit (Score:5, Insightful)

by NineNine ( 235196 ) writes: on Wednesday May 14, 2003 @07:09PM (#5959445)

These researchers are all full of shit. Why? Nobody outside of Google knows how Pagerank works, exactly. And let me tell you, if anybody did, they could make themselves millionaires overnight. There are groups of people who do nothing but try to tackle Google, and very few people successfully crack the magic formulas. And those who do make a quick buck, but then Google changes it again once people catch on. They didn't improve PageRank because they don't know how it works... they're just guessing how it works.

Share
twitter facebook
- Re:Bullshit (Score:5, Insightful)
  
  by Klaruz ( 734 ) writes: on Wednesday May 14, 2003 @07:25PM (#5959581)
  
  Umm... For the most part Stanford Researchers == Google Researchers.
  
  Google came about from a stanford research project. There's a good chance the people who are responsable for the speedup either allready knew about pagerank from working with the founders, or signed an nda.
  
  I haven't read the article, but I bet it hints at that.
  
  Parent Share
  twitter facebook
  - Golub is a pretty well-known matrix guy (Score:2)
    
    by K-Man ( 4117 ) writes:
    
    The character of Morpheus is based on him.
    
    Not really, but he wrote (co-authored) the book, literally, on matrix algorithms.
- Re:Bullshit (Score:3, Insightful)
  
  by fallout ( 75950 ) writes:
  
  What? Of course we do!
  The technology behind Google's great results [google.com]
- Re:Bullshit (Score:2)
  
  by Aix ( 218662 ) writes:
  
  I suggest reading the original research paper [stanford.edu]. It gives a very nice overview of how it actually works. It is very clever, but it is not magic. Mostly, they managed to come up with an approach that is very robust against manipulation, even if the would-be manipulators were aware of the internals.
  
  There is no need to hypothesize conspiracy.
  - Re:Bullshit (Score:2)
    
    by NineNine ( 235196 ) writes:
    
    I'm not suggesting conspiracy. I'm just saying that if anybody knew the exact formula, no matter what it was, it *could* be manipulated. And with the amount of traffic going through google, you could make a fortune selling ice cubes to eskimos. Even that paper doesn't spell out exactly how page rank works. If I knew how page rank worked, I'd be able to pull in tens of thousands of $$ a day. I actually knew one guy that knew how it worked (he paid a math grad student to study it for a year), and for a w
- Re:Bullshit (Score:2)
  
  by gargle ( 97883 ) writes:
  
  The poster is right. The page rank as implemented at Google is much more complex than what was presented in the original paper. e.g. it incorporates modifications to hold back attempts to articifically increase page ranks; it's a continual arms race(btw I've taken a class on Data Mining by Ullman, Sergey Brin and Larry Page's original advisor).
Another step towards realtime search (Score:1)

by manmanic ( 662850 ) writes:

If this could be combined with a much more frequent Google web trawl, the path would be opened towards realtime web searching, where web content is indexed and ranked in a matter of hours. When that day comes, services like Google Alert [googlealert.com] will come into their own. Just imagine being notified by email an hour after someone mentions your name!
Sepandar Rules! (Score:4, Informative)

by ChadN ( 21033 ) writes: on Wednesday May 14, 2003 @07:22PM (#5959553)

I studied under the SCCM [stanford.edu] program at Stanford, and started the same year as Sepandar Kamvar. I remember him as a great guy, very smart, and an EXCEPTIONALLY good speaker and tutor (I was always pestering him for explanations of the week's lectures).

I'm glad to hear his research is getting attention, and I hope others who are interested in the theoretical aspects of data mining and web search engines will take a look at the SCCM and statistics programs at Stanford (shameless plug - other can post pointers to similar programs).

Share
twitter facebook
- - Re:Sepandar Rules! (Score:1)
    
    by ChadN ( 21033 ) writes:
    
    Thanks for the response. As far as "real world" SCCM students, Paul Hargrove was (is?) an SCCM student, and he wrote the HFS filesystem driver for the Linux kernel (I think while a student). Fun to see Ph.D students of Physics doing driver development on the side. :)
Cool but unimportant (Score:3, Interesting)

by Anonymous Coward writes: on Wednesday May 14, 2003 @07:22PM (#5959554)

Well, according to Moore's law (or rather observation), PageRank would become 5 times faster in a couple of years anyway.

Share
twitter facebook
Why personalized is not always good (Score:3, Funny)

by Tarindel ( 107177 ) writes: on Wednesday May 14, 2003 @07:22PM (#5959558)

The speed-ups to Google's method may make it realistic to calculate page rankings personalized for an individual's interests or customized to a particular topic

I did a search on "The Sex Monster", a 1999 movie about a man whose wife becomes bisexual, and now my Google thinks I'm gay!

(joke reference: http://online.wsj.com/article_email/0,,SB103826193 6872356908,00.html [wsj.com])

Share
twitter facebook
Right ... (Score:2, Funny)

by Anonymous Coward writes:

because the 0.01 seconds to search the web isn't fast enough :)
Customized Pagerank (Score:5, Informative)

by K-Man ( 4117 ) writes: on Wednesday May 14, 2003 @08:17PM (#5959922)

Sounds a lot like Kleinberg's HITS algorithm, circa 1997. Try Teoma [teoma.com] for a real-world implementation.

For example, searching a sports-specific Google site for "Giants" would give more importance to pages about the New York or San Francisco Giants and less importance to pages about Jack and the Beanstalk.

Coincidence time: I used the same example in a presentation a couple of years ago to illustrate how subgroupings can be found for a single search term. Try it [teoma.com] on Teoma, and see the various subtopics under "Refine". IIRC each of those is a principal eigenvector of the link matrix.
Topologically speaking, each principal eigenvector corresponds to a more or less isolated subgraph, eg the subgraph for "San Francisco Giants" is not much connected to the nest of links for "They Might Be Giants", and we get a nice list of subtopics.
(I once tried to explain this algorithm to my bosses at my former employer [looksmart.com], which is why I have so much free time to type this right now.)

Share
twitter facebook
Public Funding? (Score:2, Interesting)

by grimani ( 215677 ) writes:

The research was done partially with public funding from an NSF grant, yet the commercial applications are obvious and immediate.

So my question is, who sees the benefit of the research? The researchers? Can Google just jack the results and incorporate into their system?

It seems to me that the current system of allocation research dollars with public and private grants is very messy and needs overhaul.
- Re:Public Funding? (Score:1)
  
  by doktor-hladnjak ( 650513 ) writes:
  
  So my question is, who sees the benefit of the research? The researchers? Can Google just jack the results and incorporate into their system?
  The public (who by the way pay taxes, which ultimately fund NSF grants) is the one who generally benefits from developments like this, hopefully with better search engine results.
  So long as there aren't patent issues (which doesn't seem to be the case here), Google can "jack" the technology. The key thing though is that ANYBODY can "jack" it, not just Google and n
Personalized? Rather not! (Score:2, Insightful)

by jfreon ( 589820 ) writes:

I'd rather have a clean search, than a prejudiced search based on my past searches. Who knows what I'm really interested in that day - surely not Google.
And don't call me Shirley!
More important than speed and quantity... (Score:2, Interesting)

by ktorn ( 586456 ) writes:

... is quality.

I'm surprised how Google is choosing not to implement search features that would greatly enhance advanced queries.
How often I'd wish they allowed wildcards in their queries (where engl* would pull hits with england, english, etc).
Field searches still require you to add keywords, so I cannot just query "site:somesite.com" to get all the currently indexed pages from somesite.com
In this respect Altavista still produces better results, with an excelent range of fields [altavista.com] to choose from.
If ther
- Re:Is it me or does everyone get crappy sites (Score:4, Funny)
  
  by mcpkaaos ( 449561 ) writes: on Wednesday May 14, 2003 @07:39PM (#5959687)
  
  Is it me or does everyone get crappy sites
  
  It's a stab in the dark, but I'll wager that the quality of the search results is directly tied to the quality of the query.
  
  Yeah, it's a stretch, I know, but bear with me... just moments ago I googled for "slashdot flamebait" and came up with a link to your post.
  
  --
  mcpHuzzah!kaaos
  
  Parent Share
  twitter facebook
- Re:Is it me or does everyone get crappy sites (Score:2)
  
  by realdpk ( 116490 ) writes:
  
  Google's quality does seem to be going downhill, but I strongly suspect their splitting blogs out to their own index will do a LOT towards reducing the decline, and perhaps reversing it. Anything they can do to make sure people aren't abusing the system is a good thing.
  
  (Here's hoping the next thing they split out are mailing list archives.)
  - Re:Is it me or does everyone get crappy sites (Score:1)
    
    by ralmin ( 459495 ) * writes:
    
    (Here's hoping the next thing they split out are mailing list archives.)
    
    Actually, I often find mailing list archives very helpful for solving technical problems.
    However, if they would instead add them into their Google Groups hierarchy it could be quite good.
    --
    Simon.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Ok... (Score:3, Interesting)

prior art (Score:2)

Licensed under U.S. Patent 4,558,302 (Score:2, Informative)

Re:prior art (Score:1)

Re:Ok... My ass does not suxa (Score:1)

quicker porn! (Score:2, Funny)

Let me guess... (Score:5, Funny)

Re:Let me guess... (Score:5, Funny)

They use pigeons? (Score:2)

Lets see... (Score:2, Interesting)

Re:Lets see... (Score:5, Insightful)

Re:Lets see... (Score:2)

Re:Lets see... (Score:1)

Re:Lets see... (Score:3, Informative)

Re:Lets see... (Score:5, Informative)

Re:Lets see... (Score:2)

Re:Lets see... (Score:4, Informative)

Re:Lets see... (Score:2)

Re:Lets see... (Score:1)

Charge for it (Score:3, Funny)

Re:Charge for it (Score:4, Informative)

Re:Charge for it (Score:3, Informative)

Re:Charge for it (Score:2)

Why? (Score:5, Funny)

Re:Why? (Score:1)

Re:Why? (Score:5, Funny)

Re:Why? (Score:4, Funny)

Re:Why? (Score:2)

Re:Why? (Score:2)

Re:Why? (Score:2, Funny)

Re:Why? (Score:2)

Google is a free service, isn't it? no (Score:2)

Re:Charge for it (Score:1)

CmdrTaco, ScienceBlog editor? (Score:5, Interesting)

Personalized PageRanks is from the dbpubs Abstract (Score:5, Insightful)

Re:Personalized PageRanks is from the dbpubs Abstr (Score:2)

Exactly! (Score:1)

Re:CmdrTaco, ScienceBlog editor? (Score:3, Insightful)

Re:CmdrTaco, ScienceBlog editor? (Score:3, Funny)

Re:CmdrTaco, ScienceBlog editor? (Score:2)

How far we've come (Score:5, Funny)

Re:How far we've come (Score:2)

Re:How far we've come (Score:2)

Re:How far we've come (Score:3, Funny)

Patentize now! (Score:4, Funny)

Patented yet? (Score:3, Funny)

Personal recommendations for news (Score:5, Insightful)

Nobody needs this (Score:1)

Assumption: (Score:5, Interesting)

Clarification. (reply to self) (Score:5, Interesting)

Assumptions on PageRank (Score:4, Insightful)

I didn't state my point clearly. (Score:2)

Re:yeah I know (Score:1)

Hmmm (Score:5, Funny)

Does speed matter? (Score:2, Insightful)

Re:Does speed matter? (Score:1)

Re:Does speed matter? (Score:1)

Re:Does speed matter? (Score:5, Insightful)

Re:Does speed matter? (Score:2)

Re:Does speed matter? (Score:3, Interesting)

Re:Does speed matter? (Score:2)

Re:Does speed matter? (Score:1)

Damn It! (Score:1)

Printer-Friendly (Score:2, Informative)

sure... (Score:2)

TV does this one better (Score:2)

Fox News not exactly personalized (Score:1)

Why are public funds going to... (Score:2, Interesting)

A true test of our devotion to Google (Score:2, Insightful)

Re:A true test of our devotion to Google (Score:1)

I'm Not Sure I Like The Part About... (Score:4, Interesting)

Re:I'm Not Sure I Like The Part About... (Score:3, Interesting)

Mod Parent Up, Please (Score:2)

Re:I'm Not Sure I Like The Part About... (Score:3, Interesting)

Bullshit (Score:5, Insightful)

Re:Bullshit (Score:5, Insightful)

Golub is a pretty well-known matrix guy (Score:2)

Re:Bullshit (Score:3, Insightful)

Re:Bullshit (Score:2)

Re:Bullshit (Score:2)