Forgot your password?
typodupeerror
Google Databases Technology

Google Caffeine Drops MapReduce, Adds "Colossus" 65

Posted by samzenpus
from the time-to-upgrade dept.
An anonymous reader writes "With its new Caffeine search indexing system, Google has moved away from its MapReduce distributed number crunching platform in favor of a setup that mirrors database programming. The index is stored in Google's BigTable distributed database, and Caffeine allows for incremental changes to the database itself. The system also uses an update to the Google File System codenamed 'Colossus.'"
This discussion has been archived. No new comments can be posted.

Google Caffeine Drops MapReduce, Adds "Colossus"

Comments Filter:
  • Re:Well (Score:3, Informative)

    by kurokame (1764228) on Sunday September 12, 2010 @02:37AM (#33550896)

    No, the old system was transactional as well. The problem was that it was transactional across a very large number of operations being run in parallel, and any failure could cause the entire transaction to fail. The new system is incremental rather than monolithic. While it may not be quite as fast across a large number of transactions, it doesn't risk major processing losses either. Such failures are very unlikely, but the Google index has grown large enough that it is probably running into unlikely problems all the time.

    MapReduce is also staged, and the first stage must complete before the second can start. At Google's scales, this adds up to quite a lot of wasted power.

    Processing a batch of data with Colossus is probably slower than using MapReduce under ideal circumstances. But failures don't incur a major penalty under Colossus, and MapReduce ties up CPU cycles with waits which aren't wasted under Colossus. Even if Colossus is slower under ideal circumstances, it's more reliable and more efficient in practice.

  • by kurokame (1764228) on Sunday September 12, 2010 @02:46AM (#33550928)

    No, that's not it.

    MapReduce is a sequence of batch operations, and generally, Lipkovits explains, you can't start your next phase of operations until you finish the first. It suffers from "stragglers," he says. If you want to build a system that's based on series of map-reduces, there's a certain probability that something will go wrong, and this gets larger as you increase the number of operations. "You can't do anything that takes a relatively short amount of time," Lipkovitz says, "so we got rid of it."

    "[The new framework is] completely incremental," he says. When a new page is crawled, Google can update its index with the necessarily changes rather than rebuilding the whole thing.

    There are still cases where Caffeine uses batch processing, and MapReduce is still the basis for myriad other Google services. But prior the arrival of Caffeine, the indexing system was Google's largest MapReduce application, so use of the platform has been significantly, well, reduced.

    They're not still using MapReduce for the index. It's still supported in the framework for secondary computations where appropriate, and it's still used in some other Google services, but it's been straight-up replaced for the index. Colossus is not a new improved version of MapReduce, it's a completely different approach to maintaining the index.

  • by kurokame (1764228) on Sunday September 12, 2010 @02:47AM (#33550936)

    Sorry, Colossus is the file system. Caffeine is the new computational framework.

    I made the same error in several posts now...but Slashdot doesn't support editing. Oh well! Everyone reads the entire thread, right?

  • by Anonymous Coward on Sunday September 12, 2010 @04:53AM (#33551316)

    Colossus is also the name of the computers Bletchley Park used to crack the German Lorenz cipher.
    http://en.wikipedia.org/wiki/Colossus_computer [wikipedia.org]

  • Re:Well (Score:4, Informative)

    by TheRaven64 (641858) on Sunday September 12, 2010 @06:52AM (#33551674) Journal

    Yes and no. With MapReduce, they were hitting Amdahl's Law. The speed limit of any concurrent system is defined by the speed of the slowest serial component. This is why IBM still makes money selling very fast POWER CPUs, when you can get the same speed on paper from a couple of much cheaper chips.

    The old algorithm (massive oversimplifications follow) worked by indexing a small part of the web on each node, building a small index, and then combining them all in the last step. Think of a concurrent mergesort or quicksort - the design was (very broadly) similar.

    The problem with this was that the final step was the one that updated the index. If one of the nodes failed and needed restarting, or was slow due to the CPU fan failing and the processor down-clocking itself, the entire job was delayed. The final step was largely serial (although it was actually done as a series of hierarchical merges) so this also suffered from scalability problems.

    The new approach runs the partial indexing steps independently. Rather than having a separate step to merge them all, each one is responsible for merging itself into the database. This means that if indexing slashdot.org takes longer than expected then this just delays updates for slashdot.org, it doesn't delay the entire index update.

    The jab at Microsoft in the El Reg article is particularly funny, because Google is now moving from a programming model created at MIT's AI labs to one very similar to the model created at Microsoft Research's Cambridge lab, in collaboration with Glasgow University.

  • Re:I have no idea (Score:3, Informative)

    by bitflip (49188) on Sunday September 12, 2010 @08:46AM (#33552032)
  • by maraist (68387) * <michael.maraistN ... .com ['n0s' in g> on Sunday September 12, 2010 @06:31PM (#33556138) Homepage
    BigTable scales pretty well (go read it's white-papers) - though perhaps not as efficiently as map-reduce for something as simple as text to keyword statistics (otherwise why wouldn't they have used it all along).

    I'll caveat this whole post with - this is all based on my reading of the BigTable white-paper a year ago, but having played with Cassandra, Hadoop, etc occasionally since then. Feel free to call me out on any obvious errors. I've also looked at a lot of DB internals (Sybase, Mysql MyISAM/INNODB and postgresql).

    What I think you're thinking is that in a traditional RDBMS (which they hint at), you have a single logical machine that holds your data.. That's not entirely true, because even with mysql, you can shard the F*K out of it. Consider putting a mysql server on every possible combination of the first two letters of a google-search. Then take high density combinations (like those beginning with s) and split it out 3, 4 or 5 ways.

    There are drastic differences to how data is stored, but that's not strictly important - because there are column-oriented table stores in mysql and other RDBMS systems. But the key problem of sharding is what's focused on Mysql-NDB-Cluster (which is a primitive key-value store) and other distributed-DB technologies that best traditional DBs at scalability.

    BUT, the fundamental problem that page-searches are dealing with is that I want a keyword to map to a page-view-list (along with meta-data such as first-paragraph / icon / etc) that is POPULATED from statistical analysis of ALL page-centric data. Meaning you have two [shardable] primary keys. One is a keyword and One is a web-page-URL. But the web-page table has essentially foreign keys into potentially THOUSANDS of keyword records and visa-versa. Thus a single web-page update would require thousands of locks.

    In map-reduce, we avoid the problem. We start off with page-text, mapped to keywords with some initial meta-data about the parent-page. In the reduce phase, we consolidate (via a merge-sort) into just the keywords, grouping the web pages into ever more complete lists of pages (ranked by their original meta-data - which includes co-keywords). In the end, you have a maximally compact index file, which you can replicate to the world using traditional BigTable (or even big-iron if you really wanted).

    The problem of course, was that you can't complete the reduce phase until all web pages are fully downloaded and scanned.. ALL web pages. Of course, you do an hourly job which takes only high-valued web-pages and merges with the previous master list. So you have essentially static pre-processed data which is over-written by a subset of fresh data.. But you still have slowest-web-page syndrome. Ok, so solve this problem by ignoring web-load requests that don't complete in time - they'll be used in the next update round.. Well, you still have the issue of massive web-pages that take a long time to process. Ok, so we'll have a cut-off for them too.. Mapping nodes which take too long, don't get included this round (you're merging against you last valid value - so if there isn't a newer version, the old one will naturally keep). But the merge-sort itself is still MASSIVELY slow. You can't get 2-second turn-around on high-importance web-sites. You're still building a COMPLETE index every time.

    So now, with a 'specialized' GFS2 and specialized BigTable, either or both with new fangled 'triggers', we have the tools (presumably) to do real-time updates. A Page load updates its DB table meta-data. It see's it went up in ranking, so it triggers a call to modify the associated keyword's table (a thousand of them). Those keywords have some sort of batch-delay (of say 2 seconds) so that it minimizes the number of pushes to production read-servers.. So now we have an event queue processor on the keyword table. This is a batch processor, BUT, we don't necessarily have to drain the queue before pushing to production. We only accept as many requests as we can fit into a 2 second time-slice. Presumably

The unfacts, did we have them, are too imprecisely few to warrant our certitude.

Working...