Forgot your password?
typodupeerror
Google Businesses The Internet Databases Programming Software IT Technology

Google Sorts 1 Petabyte In 6 Hours 166

Posted by Soulskill
from the sort-of-fast dept.
krewemaynard writes "Google has announced that they were able to sort one petabyte of data in 6 hours and 2 minutes across 4,000 computers. According to the Google Blog, '... to put this amount in perspective, it is 12 times the amount of archived web data in the US Library of Congress as of May 2008. In comparison, consider that the aggregate size of data processed by all instances of MapReduce at Google was on average 20PB per day in January 2008.' The technology making this possible is MapReduce 'a programming model and an associated implementation for processing and generating large data sets.' We discussed it a few months ago. Google has also posted a video from their Technology RoundTable discussing MapReduce."
This discussion has been archived. No new comments can be posted.

Google Sorts 1 Petabyte In 6 Hours

Comments Filter:
  • by Anonymous Coward on Sunday November 23, 2008 @12:54PM (#25865203)

    for knowing how important the Library of Congress metric is to us nerds!

  • by Zarhan (415465) on Sunday November 23, 2008 @12:59PM (#25865255)

    Yay! We finally have unit conversion from 1 LoC to bytes! So...20 PB = 6LoC, means that 1 LoC = 3,333... PB :)

    • Re: (Score:3, Informative)

      Don't you mean 1PB = 12LoC?
    • Re:Unit conversion (Score:4, Informative)

      by Neon Aardvark (967388) on Sunday November 23, 2008 @01:22PM (#25865429) Homepage

      No, 1 PB = 12 LoC, so 1 LoC = 0.0833... PB

      Also, I'd like to make some kind of swimming pool reference.

      • Re: (Score:2, Interesting)

        by Anonymous Coward

        Assuming it was written in binary in a font that allows for 1 digit per 2mm, the length of the data would be 183251938 m, or 1145324 times the perimeter of an olympic-sized swimming pool.

      • by owlnation (858981)

        No, 1 PB = 12 LoC, so 1 LoC = 0.0833... PB. Also, I'd like to make some kind of swimming pool reference.

        Yes, but how much is that in football fields?

        • Re: (Score:1, Interesting)

          by Anonymous Coward

          Yes, but how much is that in football fields?

          You silly sod, you can't measure something in football fields! There's internationalization to take into account!

          Canadian football fields are 100x59m, American football fields are 109x49m, and the rest of the world doesn't even play the same game on a football field. And THEIR sport has a standard range, anywhere from 90-120m by 45-90m (Thank you wikipedia [wikipedia.org]).

          You've now introduced variable-variables! We can't get an absolute number!

          • by Yvan256 (722131)

            Can we get an absolute variable instead?

          • Re: (Score:2, Informative)

            by ewanm89 (1052822)
            Well, American's don't even play *foot*ball with there feet.
            • Re: (Score:2, Funny)

              This is an excellent point. No American football player has used his feet since the NFL adopted hoverchairs into the rules in 1974.
              • This is an excellent point. No American football player has used his feet since the NFL adopted hoverchairs into the rules in 1974.

                If that is enough foot for you, I really want to see the American handball team.

            • by dwpro (520418)

              Yes, yards are close enough for us, but by God, we're still using English measurements.

      • Re: (Score:3, Informative)

        by Zarhan (415465)

        Oh darn. Clearly I was converting pound-congresses to kilos first.

      • by Gazzonyx (982402)
        Just to avoid confusion, we're using base 2 units, correct? Where a KB = 1024 bytes and a MB is 1024 KB, etc. At the PB level, the difference between a KB being 1000 bytes adds up.
    • by neoform (551705)

      What format are they using for the books when doing this calculation as to the size of the LoC?

      Raw Text?

      PDF?

      JPEG? ....

    • by UltraAyla (828879)
      I like your thinking, but would like to modify it (I realize it was a joke). Considering the rate at which LoC archives data, we should put some datestamps on it so that, including the other correction, 1PB = 12 081123LoC. Just a thought
  • That's Easy (Score:5, Interesting)

    by Lord Byron II (671689) on Sunday November 23, 2008 @01:04PM (#25865299)
    Consider a data set of two numbers, each .5 petabyte big. It should only take a few minutes to sort them and there's even a 50% chance the data is already sorted.
    • Re:That's Easy (Score:5, Insightful)

      by Blakey Rat (99501) on Sunday November 23, 2008 @01:08PM (#25865335)

      I came here to post the same thing. If they sorted a petabyte of Floats, that might be pretty impressive. But if they're sorting 5-terabyte video files, their software really sucks.

      Not enough info to judge the importance of this.

      • Re:That's Easy (Score:5, Informative)

        by farker haiku (883529) on Sunday November 23, 2008 @01:16PM (#25865387) Journal

        I think this is the data set. I could be wrong though. The article (yeah yeah) says that

        In our sorting experiments we have followed the rules of a standard terabyte (TB) sort benchmark.

        Which lead me to this page [hp.com] that describes the data (and it's available for download).

        • In our sorting experiments we have followed the rules of a standard terabyte (TB) sort benchmark.

          Which lead me to this page that describes the data (and it's available for download).

          For the record, you can download a file that will generate the data for it. Because otherwise, well, posting a link to a 1TB file on Slashdot might melt the entire Internet.

      • I dunno, it depends on what criteria they are using to sort video files. If by file name, then yeah, not so impressive, but if their sorting based on a measure of relevance of the contained contents, my jaw would drop and my eyes would pop out.
    • Re:That's Easy (Score:5, Informative)

      by Anonymous Coward on Sunday November 23, 2008 @01:16PM (#25865389)

      From TFA: they sorted "10 trillion 100-byte records"

    • by sakdoctor (1087155) on Sunday November 23, 2008 @01:18PM (#25865407) Homepage

      And yet google don't even convert petabytes to libraries of congress in the google calculator.
      Or perhaps I got the syntax wrong.

    • by JamesP (688957)

      Chances are now they are going to ask potential employees being interviewed there how to do it using half the time and one tenth of the machines...

  • by Animats (122034) on Sunday November 23, 2008 @01:12PM (#25865371) Homepage

    Sorts have been parallelized and distributed for decades. It would be interesting to benchmark Google's approach against SyncSort [syncsort.com]. SyncSort is parallel and distributed, and has been heavily optimized for exactly such jobs. Using map/reduce will work, but there are better approaches to sorting.

    • by perlchild (582235)

      And Google is trying to make money off mapreduce(as an api of sorts), so now you're surprised they're using their massive resonance over the market, especially geeks, in order to heighten awareness of their product?

      On the other hand, what they're trying to prove is mapreduce's worth, as a workload divider(how to break-up 20PB for sorting), not necessarily how optimal it is in the current situation. They have a better test/sample of mapreduce, but it's a trade secret to them(how it's used to index the pages

    • Re: (Score:3, Interesting)

      Parallel/distributed sorting doesn't eliminate the need for map/reduce, it just helps spread the problem set across machines.

      Here's the thing though...its the distributing of the problem set and the combining of the results that is the hard part - not map/reduce.

      Map and reduce are simple functional programming paradigms. With map, you apply a function to a list - which could be either atomic values or other functions. With reduce, you take a single function(like add or multiply, for instance) and use that t

    • I suspect maybe you don't quite understand how MapReduce works. Take a look at the references section of the MapReduce paper [google.com]; the paper's authors are well aware of research in the parallel sorting field. In particular their reference 1 [google.com] is most relevant.

    • by ShakaUVM (157947) on Monday November 24, 2008 @08:22AM (#25871655) Homepage Journal

      >>Using map/reduce will work, but there are better approaches to sorting.

      It kinda bugs me that Google trademarked (or, at least, what they named their software) after a programming modality that has been in parallel processing for ages. In fact, MPI has a mapreduce() function that, well, does a map/reduce operation. I.e., farms out instances of a function to a cluster, then gathers the data back in, summates it, and presents the results to someone.

      It kind of bugs me (in their Youtube video linked in TFA, at least) that they make it seem that this model is their brilliant idea, when all they've done is write the job control layer under it. There's other job control layers that control spawning new processes, fault tolerance, etc., and have been for many, many years. Maybe it's nicer than other packages, in the same way that Google Maps is nicer than other map packages, but I think most people like it just because they don't realize how uninspired it is.

      It'd be like them coming out with Google QuickSort(beta) next.

    • Using map/reduce will work, but there are better approaches to sorting.

      I think we can safely assume that the hordes of egghead computer scientists are already exploring the alternate approaches.

      Perhaps SyncSort has better theoretical performance, but Map/Reduce yields better results in Google's real-world scenarios? I don't know, it's all way above my head.

  • Finally... (Score:5, Funny)

    by aztektum (170569) on Sunday November 23, 2008 @01:17PM (#25865403)

    I will be able to catalog my pr0n in my lifetime:

    Blondes, Brunettes, Red heads, Beastial^H^H^H^H^H "Other"

    • tagging (Score:5, Interesting)

      by Hao Wu (652581) on Sunday November 23, 2008 @01:34PM (#25865509) Homepage

      I will be able to catalog my pr0n in my lifetime:

      It's not enough to sort by blond, black, gay, scat, etc. Some categories are a combination that don't belong in a hierarchy.

      That is where tagging comes in. Sorting can be done on-the-fly, with no one category intrinsically more important.

      • Re:tagging (Score:5, Funny)

        by gardyloo (512791) on Sunday November 23, 2008 @01:53PM (#25865671)

        pr0n for Geeks, volume 18: Sorting On-the-Fly

      • by AbRASiON (589899) *

        I swear I shouldn't admit this but good lord you're right - I _REALLY_ wanted WinFS to come out and I wanted it to be good.
        A database driven filesystem would be so goddamned useful, it would change the way we work with computers but noooo Microsoft screwed up (furthermore WinFS was a hack, on top of NTFS anyhow I heard it was SQL server or something, sounds messy)

        Porn is a fantastic example, I realise it's kind of immature but I mean it would be genuinely useful.
        Set tags like:
        threesome
        oral
        brunette
        money shot

    • Re: (Score:2, Funny)

      by Pugwash69 (1134259)
      How do you catalogue the topics? I mean "Clown" and "Monkey" are so different, but something with both elements could be difficult to sort.
    • by Fumus (1258966)
      For the love of puppies. Learn to spell "bestiality". Half the population can't spell it right :/
  • by DaveLatham (88263) on Sunday November 23, 2008 @01:23PM (#25865443)
    It looks like Google saw Yahoo crowing about winning the 1 TB sort contest using Hadoop [yahoo.net] and decided to one up them!

    Let's see if Yahoo responds!
    • by Anpheus (908711)

      Hadoop uses MapReduce :) From their site:

      Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

    • by iwein (561027)
      With a larger dataset outscaling efficiency becomes more important than sorting efficiency. Sorting 1PB is different than sorting 1TB.

      Since were relating to human proportions today, I'll compare your comparison to comparing running 100m to running a marathon. Apply story telling skills and score.

    • by jollyplex (865406) on Sunday November 23, 2008 @07:49PM (#25868305)
      Exactly. It's unclear if their better time was a software engineering or algorithmic feat, though. Hadoop was able to finish sorting the 1 TB benchmark dataset in 209 s; TFA states Google pulled the same event off in 68 s. The Yahoo blog post you linked to says their compute nodes each sported 4 SATA HDDs. Note TFA mentions Google's 1 PB dataset sort used 48,000 HDDs split between 4,000 machines, or 12 HDDs to a machine. If Google used the same machines to perform their 1 TB sort, then they had 3 times as many HDDs on each compute node, and could probably pull data from storage 3 times as fast. 209 s / 68 s ~ 3.1 -- coincidence, or not? =)
  • Sort? Sort what? (Score:1, Insightful)

    by mlwmohawk (801821)

    One quadrillion bytes, or 1 million gigabytes.

    How big are the fields being sorted. Is it an exchange sort or a reference sort?

    It is probably very impressive, but without a LOT of details, it is hard to know.

    • Re:Sort? Sort what? (Score:5, Informative)

      by nedlohs (1335013) on Sunday November 23, 2008 @01:34PM (#25865507)

      I realize, slashdot..., but maybe you could glance at the article which states:

      10 trillion 100-byte records

      • by mlwmohawk (801821)

        10 trillion records across 4,000 computers comes to 2.5 billion records per computer.

        It took 6 hours for a computer to sort 2.5 billion records? 250G?

        Yawn.

        • Re: (Score:3, Insightful)

          by nedlohs (1335013)

          You do have to merge them all back together at the end...

          But I'm sure you can do better tonight.

          • Re: (Score:1, Flamebait)

            by mlwmohawk (801821)

            You do have to merge them all back together at the end...

            Technically speaking, that's not true. In fact, you wouldn't want too.

            Assuming some sort of search paradigm, you'd keep the records on their 4000 separate servers, each server doing its on search functionality, and *only* merge the results of the searches as needed and cache them in the web layer.

            • by mlwmohawk (801821)

              How did someone see this as flamebait?

        • Re: (Score:3, Insightful)

          by chaim79 (898507)
          right, so it's 250gb sorted in 6 hours... now where does the sorting and integration of the 4000 250gb blocks of sorted data come in? :)
          • by mlwmohawk (801821)

            right, so it's 250gb sorted in 6 hours... now where does the sorting and integration of the 4000 250gb blocks of sorted data come in?

            You wouldn't merge it in to one set, you'd keep it all on their own servers and only merge the results as needed.

            • by chaim79 (898507)

              if you sort 4000 blocks of random data into an actual order, but don't combine the data in any serious way, what you have is tons of overlap in all these seperate blocks of data. Just talking about 1-20 on 4 servers:

              • Server 1: 1, 5,6,9,13
              • Server 2: 3,11,12,17,19
              • Server 3: 2,4,7,15,20
              • Server 4: 8,10,14,16,18

              That data may be sorted but it's a mess, and doing this type of sort for a competition is nothing more then getting fast servers and sticking them in the same room and have them all sort random blocks of da

    • by neoform (551705)

      Odds are they're using the mythical "google algorithm", so they're probably going to keep what they're doing quiet.

    • by Dpaladin (890625) on Sunday November 23, 2008 @02:12PM (#25865771)
      Sorting a petabyte sounds pretty impressive, but I don't think it was a whole yotta work.
  • by Anonymous Coward

    Finaly... A system with enough power to run vista efficiently.

  • by g0dsp33d (849253) on Sunday November 23, 2008 @01:41PM (#25865573)
    Not a big deal, that's just the data they have on you.
  • As memory gets cheaper and I can store more locally, what I really need to know is whether it is unique or new to me. I can read Frits P0st a million times and never get tired of it. There was a very good article on slashdot the other day and it got over 2000 comments, some of which were very insightful and useful. I need a way to know for myself what is new to me. I would be nice if the browser interacted more with Google to help me with that. I just looked, and RTFM is indexed 4.5 million times which of c
  • 0s and 1s (Score:2, Funny)

    by johno.ie (102073)

    That's a lot of computing power to use just to get 4,000,000,000,000 0s and 4,000,000,000,000 1s.

  • ...fancy doing my mp3 collection?

  • by TinBromide (921574) on Sunday November 23, 2008 @02:03PM (#25865715)
    First of all, this isn't a straight up "Libraries of Congress" (better known and mentioned in prior posts as a LoC). Its the web archiving arm of the LoC. I call for the coining of a new term, WASoLoC (Web Archival System of Library of Congress) which can be defined as X * Y^Z = 1 WASoLoC where X is some medium that people can relate to (books, web pages, documents, tacos, water, etc), Y is a volume (Libaries, Internets, Encyclopedias, end to end from A to B, swimming pools, etc) and Z is some number that marketing drones come up with because it makes them happy in their pants.

    Honestly, How am i supposed to know what "..the amount of archived web data in the US Library of Congress as of May 2008." Looks like!? I've been to the library of congress, i've seen it, its a metric shit-ton of books (1 shit-ton = Shit * assloads^fricking lots), but i have no clue what the LoC is archiving, what rate they're going at it, and what the volume is of it.
  • That must have taken a lot of monkeys.
  • by stimpleton (732392) on Sunday November 23, 2008 @02:12PM (#25865769)
    Good.

    They clearly have the ability to respond to emergencies. And this puts it out there that they can...

    eg;
    1) Foot n mouth out break in cattle
    2) A supliment to census data
    3) Finding information of dissidents/traitors(bloggers)
  • 20,111 Servers ?? (Score:1, Interesting)

    by johnflan (1413981)
    With a little bit of excel, if takes 4,000 servers 362 minutes to calculate a 1PB job It takes 1440 (24 hours) on 20,111.11 server to sort 20pb (if it was just plain sorting they were doing). And just on a side note, from their number one of their servers can compute 741.5 MB per minute!
  • by wjh31 (1372867)
    i make this about 48GB/s, my hard drive manages about 20MB/s, even my mid-range ram manages only ~6.4GB/s, and top end ram will reach only ~13GB/s (according to wiki) so even ignoring the ability to process that much data in that time, the ability to simply move that much data is quite impressive (at time of print, may not hold one year down the line)
    • They probably didn't hold the source data on a single machine in first place (or did seagate break the Petabyte barrier, yet?).

      48GB/s broken down over 4000 servers boils down to "only" 12 Mbyte/s.
      So indeed, impressive aggregate performance, but the individual nodes were "only" performing at (roughly) the throughput of Gigabit Ethernet.

  • That's a lot of data...
  • I'm surprised (Score:1, Redundant)

    by TheSpoom (715771) *
  • Just like I measure my distance to work (452.75 football fields) i measure the data on my computer by libraries of congress?
  • by Duncan3 (10537) on Sunday November 23, 2008 @04:32PM (#25866967) Homepage

    Today from Google, the god of all things and doer of all things good in the universe, many millions of dollars in computer equipment were able to sort lots of things, in about the amount of time you would think it would take for millions of dollars of equipment to sort things.

    In other news, a woodchuck was found chucking wood as fast as a woodchuck could chuck wood.

    Congrats Google, you have a HUGE data set, and an even bigger wallet.

  • by Bitmanhome (254112) <bitmanNO@SPAMpobox.com> on Sunday November 23, 2008 @04:45PM (#25867089)

    If you feel the urge to play with MapReduce (or reade the paper), you don't need a fancy Linux distro [apache.org] to do it. MapReduce is simply the map() and reduce() functions, exactly as implemented in Python. Granted, Google implementation can work with absurdly large data sets, but for small data sets, Python is all you need.

    • Re: (Score:3, Informative)

      by boyter (964910)

      True, but not quite the point. The map and reduce functions as you say are implemented in python (and a great many other languages), but what makes MapReduce special is that you replace the Map function with one which distributes it out to other computers. Because any map function can be implemented in parallel you get a speed boost for however many machines you have (dependant on network speeds etc....).

      So yeah, you can do it in Python but you arent going to be breaking any records untill you implement you

      • But its the distributing part that is special, not the map/reduce part.

        You're basically just dividing up a huge list and sending each part to a different machine. Tacked on to each list are the map and reduce functions themselves so each machine knows what to do with the list.

        Its the parallelization of the problem that is the hard part. Map does not mean the mapping of the problem to thousands of machines - it means the mapping of a function to a list, and that is not a terribly difficult problem.

        • by TeknoHog (164938)
          IMHO it's noteworthy that the language contains a keyword for parallel operations. So you can start using map() right now, and when the implementation and hardware improves, your existing code will scale up. (I've experienced a similar development with matrix operations in Fortran.)
    • Re: (Score:3, Informative)

      Exactly. There is nothing special to map and reduce.

      Here's an example. Map and reduce are functional programming tools that work with lists. So we'll start with a simple list.

      1 2 3 4 5

      Now we'll take a function - x^2, and map it to the list. The list now becomes:

      1 4 9 16 25.

      Now, we'll apply a reduce function to our list to combine it to a single value. I'll use "+" to keep it simple. We end up with:

      55

      And that is pretty much all there is to map and reduce.

      • Re: (Score:3, Informative)

        by adpowers (153922)

        Almost, but not quite. MapReduce has a slightly different format than just map() and reduce(). Here is the signature of map and reduce from a theoretical functional language:

        map(): A* -> B*
        reduce(): B* -> C

        Whereas in MapReduce:

        map: (K, V)* -> (K1, V1)*
        reduce: (K1, (V1)*)* -> (K2, V2)*

        I think that is mostly accurate. Read more accurate/detailed report in MapReduce revisited [cs.vu.nl][PDF].

  • It really only took Two Hours - the rest of the time was used stuffing in paid ads.

  • It's always bugged me that they've been heralding MapReduce, something any functional programmer has known for the past 50 years, as something revolutionary and new. The worst of it is how all the self-styled geeks, who by rights ought to be familiar with the concept, have been lapping it all up.
    • Re:MapReduce (Score:5, Informative)

      by adpowers (153922) on Monday November 24, 2008 @01:43AM (#25870255)

      The individual functions map and reduce are quite standard. The innovation here is the systems work they've done to make it work on such a large scale. All the programmer needs to worry about is implementing the two functions, they don't have to worry about distributing the work, ensuring fault tolerance, or anything else for that matter. That is the innovation.

      They mention in the article that if you try and sort a petabyte you WILL get hard disk and computer failures. Hell, you can only read a terabyte hard disk a few times before you encounter unrecoverable errors. The system for executing those maps and reduces is what is important here. The important parts are in the design details, such as dealing with stragglers. If you have 4000 identical machines, you won't necessarily get equal performance. If a few of those machines have a bit flipped and started without disk cache, they might see a huge decrease in read/write performance. The system needs to recognize this and schedule the work differently. That can make a huge difference in execution time. If you graph the percentile complete of a MR job, you'll often see that it quickly reaches 95% and then plateaus. The last 5% may take 20% of the time, and good scheduling is required to bring this time down.

      But like I said, the innovation isn't in the idea of using a Map and Reduce function, it is the system that executes the work.

When all else fails, read the instructions.

Working...