Forgot your password?
typodupeerror
Databases Microsoft News

Microsoft Research Introduces Record-Beating MinuteSort Tech 118

Posted by timothy
from the smart-people-with-cool-ideas dept.
mikejuk writes "A team from Microsoft Research has taken the lead in the MinuteSort data sorting test using a specially-devised technology: Flat DataCenter Storage. The figures are impressive — 1401 gigabytes in 60 seconds, using 1033 disks across 250 machines. Not only is this three times as much as the previous record, but also, it uses only one sixth of the hardware resources, according to a blog post about the test from Microsoft. One thing that's interesting about the success is the technology used. While solutions such as Hadoop and MapReduce are traditionally used for working with large data sets, Microsoft Research created its own technology called the 'Flat Datacenter Storage,' or FDS for short. This isn't just academic research, of course. The team from Microsoft Research has already been working with the Bing team to help Bing accelerate its search results, and there are plans to use it in other Microsoft technologies."
This discussion has been archived. No new comments can be posted.

Microsoft Research Introduces Record-Beating MinuteSort Tech

Comments Filter:
  • Their support for research and innovation is top-notch. They are pretty much the only one of the large companies that fund this kind of research and they fund it with billions. Their work does lots of good for the world. Good job guys.
    • Re: (Score:3, Insightful)

      "They are pretty much the only one of the large companies that fund this kind of research"
      Bullshit alert.

      "Their work does lots of good for the world."
      For the world? Or for Microsoft?

      Citations needed.

      • by MikeyC01 (231948) on Wednesday May 23, 2012 @01:12PM (#40090397) Homepage

        From the Wiki (http://en.wikipedia.org/wiki/Microsoft_Research#Laboratories [wikipedia.org]), all of the following have come from MS Research

        C#
        Comic Chat (IRC Client)
        F#
        Sideshow (Became Desktop Gadgets)
        Surface (TouchLight)
        SenseCam
        ClearType
        Group Shot
        Allegiance (Game)
        Songsmith

        I'd say C#, F#, and ClearType are pretty big contributions

        • Re: (Score:1, Interesting)

          by John Sugs (2646157)
          Quora also has a discussion about some of these -> http://www.quora.com/Microsoft-Research/What-products-have-come-out-of-Microsoft-Research [quora.com]

          And these are exact, high-profile products that have come out of Microsoft Research. You have to remember that they work on many smaller things that will be then integrated into other Microsoft products, or do work 'just for science' (which is pretty amazing from Microsoft).
        • I don't doubt that Microsoft Research has made important contributions (even though from the list you posted only C# is something I can put my finger on). Obviously, it's the "They are pretty much the only one" part that is complete nonsense.

        • by Lord Lode (1290856) on Wednesday May 23, 2012 @01:41PM (#40090791)

          For one moment there, I read "Comic Sans" instead of "Comic Chat".

        • by Yvan256 (722131) on Wednesday May 23, 2012 @01:44PM (#40090829) Homepage Journal

          ClearType invented nothing apart from the name itself.

          Sub-pixel rendering was used two decades ago by Apple [grc.com].

          "Back in 1976, my design of the Apple II's high resolution graphics system utilized a characteristic of the NTSC color video signal (called the 'color subcarrier') that creates a left to right horizontal distribution of available colors. By coincidence, this is exactly analogous to the R-G-B distribution of colored sub-pixels used by modern LCD display panels. So more than twenty years ago, Apple II graphics programmers were using this 'sub-pixel' technology to effectively increase the horizontal resolution of their Apple II displays." - Steve Wozniak

          • by Anonymous Coward on Wednesday May 23, 2012 @04:03PM (#40092761)

            You immediately lose credibility by citing Steve Gibson.

            The type of subpixel rendering done on old Apple IIs essentially treats the color display as a monochrome display of triple the resolution. This is clever and useful, but causes color fringing.

            ClearType takes the concept substantially further by applying perceptual modelling to determine how the subpixels can be used. It's similar to MP3 audio, in that the process adds artifacts, but some artifacts will be invisible (or inaudible in MP3's case) to a human. The trick is minimizing the visible artifacts.

            For example, if you have a one pixel wide line, it is always safe to shift it one third of a pixel to the left. RGB becomes BRG, which still appears the same.

            However, if you have a one third pixel width line, you cannot just use one third of the subpixels. A "white" vertical line would be all red, all green, or all blue, depending on which subpixel it fell on. ClearType would render it using all three subpixels but in the correct color.

            There's quite a bit more to it - sometimes you can use a single subpixel depending on what neighbors it, and/or you can adjust adjascent subpixels to mask fringing artifacts.

            So yes, sub-pixel rendering isn't a wholly new concept, but saying ClearType isn't novel is willfully ignorant.

        • by Anonymous Coward
          C# seems a lot like Java, F# looks like OCaml and didn't Steve Jobs take calligraphy classes to produce the first "TrueType" like font?
        • MSFT Research has been a leader there for a decade. the technical programs was just announced [siggraph.org] Tuesday.
      • by robthebloke (1308483) on Wednesday May 23, 2012 @01:19PM (#40090489)

        "Their work does lots of good for the world."
        For the world? Or for Microsoft?

        Dude, seriously! You do realise this algorithm has been developed to help Microsoft sort through all of the outstanding 'serious security flaw found in IE6' tickets? Why else do you think they'd need 1033 hard drives, and 250 machines?

      • by Missing.Matter (1845576) on Wednesday May 23, 2012 @01:36PM (#40090731)

        Citations needed.

        Here you go [microsoft.com]. About 14,000 peer reviewed publications for the computer science community, about 10,000 of which were published completely in house by Microsoft Research, and about 4,000 of which were done in collaboration with Universities.

        • by Fwipp (1473271)

          The issue isn't whether they fund research - clearly they do. The GP was taking issue with "the only one" part.

          • I guess I should have specified, I was providing citations to the second part of his post:

            For the world? Or for Microsoft?

            Publishing research is beneficial for the world. Not just Microsoft.

    • by Anonymous Coward
      Am I the only one that thinks these shill stories and comments are getting old?
    • by Aguazul2 (2591049)
      Typically Microsoft develops something impressive-sounding that is good for a news story, but that never gets out of the research labs -- i.e. vapourware as far as the rest of the world is concerned. Typically we hear about Google research projects when there is a chance to try them out. Intel innovations make it into the next chip generation. As a result, Microsoft 'innovation' stories fail to arouse any interest for me at all.
    • by Galestar (1473827) on Wednesday May 23, 2012 @01:02PM (#40090253)
      Not sure if troll. Yes, they fund "this kind of research", but to say they are "pretty much the only one of the large companies that [do so]" is absurd. Please see Hadoop's origins (Google). Oh and also IBM who eats this shit for breakfast.
      • by Anonymous Coward

        The one thing about Microsoft that I respect is their seriousness about R&D - MS has the highest R&D budget ($9 billion) of all the companies. And they have Turing award winners working with them (C.A.R Hoare and Charles Thacker come to mind). I once had the honour of listening to C.A.R Hoare at a conference where he said that the most difficult job for MS R&D people is to make the rest of the organization use what they create.

      • by mikestew (1483105)

        I don't know about troll, but certainly a brand new UID that posted to one or two other articles while waiting for this to come out of the firehose. My guess is an MSR employee that found out "hey, our stuff's going to be on ./!"

        But that's neither here nor there, the post is inaccurate garbage.

      • by CODiNE (27417)

        Shooter McGavin: You're in big trouble though, pal. I eat pieces of shit like you for breakfast!
        Happy Gilmore: [laughing] You eat pieces of shit for breakfast?
        Shooter McGavin: [long pause] No!

        - Happy Gilmore 1996

    • Re: (Score:3, Informative)

      by rjr3 (658693)

      http://www.research.ibm.com/ [ibm.com]

      They used to have one of the most amazing IT geek magazines.

    • by alvinrod (889928)
      No, they're the only one that talk about it all the damned time. Remember all of the Courier "leaks" from a year or so ago, and all of the talk about the Surface?

      Other companies also spend a lot on R&D, but they just don't publicize it. Do you think Apple pulled the iPhone out of a hat or something? Hell, here's a recent blog post [arcfn.com] where the author tore down Apple's power adapter for the phone and found some interesting design work. Google probably does a lot of stuff to improve their search algorithm
      • by rockmuelle (575982) on Wednesday May 23, 2012 @02:00PM (#40091093)

        The big difference is that Microsoft Research is one of the last large corporate research labs focused on pure research. That is, research done for the sake of the research, not to drive product development. Research done at MSR doesn't have to be product driven (it has to be in the general space of software and computers, but that's about the only requirement). MSR is well funded by Microsoft and an integral part of the company's culture.

        Sure, IBM, HP, and Intel all have research labs, but their charters have been re-written over the last ten years to focus more on product-centric research. Most research projects at these companies must start with a business plan that shows how the work will be commercialized within 5 years before being approved. This is not the pure research these labs were once known for.

        Google, Facebook, Yahoo, and many other internet companies have some interesting projects (self driving cars, for instance), but these tend to be one-off projects and aren't part of a larger, long lived research organization.

        Another interesting aspect of MSR is that they encourage all MS developers to take a stint in the organization, not just specially recruited Ph.D.s. It's not uncommon for someone to go from working on a product for a few years, take some time in MSR, then go back to product work.

        I've worked directly with many of the research groups mentioned in this post over the last 20 years. Based on my experiences, MSR is truly the last real corporate research group (in the spirit of 20th century PARC/Watson/et al). The others are just part of the product funnels or whims of the founders.

        -Chris

    • Re: (Score:3, Interesting)

      by Rakishi (759894)

      First of all tons of companies fund research. Lots of papers come out of them of all kinds and plenty more that is never published.

      Second of all Microsoft is actually known for being a black hole of research. Researchers go in and almost nothing comes out. They hire people just so their competitors can't hire them. They may do a few demos but nothing commercial comes from them.

      • Second of all Microsoft is actually known for being a black hole of research. Researchers go in and almost nothing comes out.

        Except for all the published academic research papers with stuff like what's described in TFA?

        • by Rakishi (759894)

          All of which have patents attached to them ensuring that they never become too useful to anyone.

          • What corporation funds research, publishes papers, and DOESN'T patent the results? That's kindof the point... businesses don't do things out of the goodness of their hearts.
      • by kiwimate (458274)

        Microsoft is actually known for being a black hole of research. Researchers go in and almost nothing comes out. They hire people just so their competitors can't hire them.

        Citation?

        As for nothing coming out, you're apparently not including published papers [microsoft.com] (lots published by respectable bodies like IEEE, ACM, Oxford Publishing, etc.), or downloads [microsoft.com] such as Excel plug-ins to simplify working with genomic sequences, Differentially Private Network-Trace-Analysis Tools [microsoft.com], an e-mail loss detection add-in [microsoft.com], etc., etc.

        Sure, not as sexy as self-driving cars. But serious, hard research usually isn't that sexy or appealing to the general public. I thought this was a geek web site...?

  • First post (Score:5, Funny)

    by Anonymous Coward on Wednesday May 23, 2012 @12:46PM (#40089975)

    Sorted by Microsoft

  • between their ass and a hole in the ground..

    /smart people working for dumb people working for smart people
  • by schlachter (862210) on Wednesday May 23, 2012 @01:03PM (#40090275)

    ...yet MinuteSort still takes a minute!

    • by Creepy (93888)

      ...or is it minute sort, as in tiny. Minute Maid or Minute Maid? Am I going mad, yes I've gone mad. The article is slashdotted already, and my mad mind will never know.

    • by Anonymous Coward
      Phhbt, you think that's bad? Race cars are way faster now than they were in the '60s, and yet the 24 Hours of Le Mons still takes a whole day!
    • by louaish88 (731196)
      The contest is how much can be sorted in a minute, so the 1400gb is the important number.
  • by fahrbot-bot (874524) on Wednesday May 23, 2012 @01:10PM (#40090371)

    The team from Microsoft Research has already been working with the Bing team to help Bing accelerate its search results, and there are plans to use it in other Microsoft technologies.

    So Bing is going to scrape their search results from Google *and* other search engines? :-)

  • Did they actually do anything or just build a machine using todays hardware and lots of funding. A team from yahoo got the record in 2009 hardware has changed alot in the 3 years and when money is not a object couldnt anyone do about the same?
  • Downside (Score:2, Redundant)

    It only works using IE6.
  • Oh Look (Score:3, Insightful)

    by degeneratemonkey (1405019) on Wednesday May 23, 2012 @01:44PM (#40090841)
    More irrational Microsoft hatred from the peanut gallery. Interesting accomplishment from Microsoft Research (a group which has produced all kinds of useful advances in computing and software development, and which has very little to do with shipped products like Outlook, IE6, etc.); Average /. luser interpretation? LOL SHILL ARTICLE FROM TEH MICRO$OFT FAGGORTZ YOU SUCK LOL.

    Good to see that a nerd site is inundated with droves of empty-headed group-think religious fanatics!

    When you're done masturbating to your imaginary universe, maybe you'd like to sit down with the likes of Simon Peyton-Jones and discuss some of the finer points of the terrible work he and his peers have been doing.

    Baa-hahahaha. Right.
    • by Anonymous Coward

      The initial post heaps on un-warranted praise on Microsoft and that post was made using an account that is only getting it's first post today and will not have any further posts. So yes it is shill post and people bitch about shill posts as they should.

      Second, it is well known that yes Microsoft spends tons of money on research, it is also well known that almost none of that research makes it's way into their products.

      Yes, the individuals who did this deserve praise, but no one will benefit from this resea

    • by jeff8j (2646163)
      Do me a favor and look at the previous record holder in close detail and tell me that microsoft actually did anything other than buy the record...

      microsoft - 1033 disks
      yahoo - 1406
      difference 373 - that alone I would say would just be from advancements made in hard drives

      yahoo nodes
      2x quad core xeons
      8GB of assuming ddr2 ram which was current at the time
      1gb ethernet port on each node
      40 nodes per rack

      microsoft
      2 - 12 cores a cluster
      24GB - 96GB assuming ddr3 ram
      10gb ethernet ports
      78% were 10,000
      • by jeff8j (2646163)
        Yes this is my first day under this account but that is irrelevant, I have been coming here for years I just decided to stop posting as anonymous.
        Microsoft spends lots of money on many things like you say does this actually help anyone other than microsoft? Of course not so why would anyone praise a marketing scheme?
        • Microsoft spends lots of money on many things like you say does this actually help anyone other than microsoft?

          Go to Google Scholar and search for papers published by MSR folk.

          • by jeff8j (2646163)
            Isnt that irrelavent because "does this actually help anyone other than microsoft" this being in question not other publications that I care not get into.
            • Pretty much all stuff from MSR ends up published, so presumably whatever is new & special here would be published as well, for others to use and build upon.

              • by jeff8j (2646163)
                Well thats good but I guess well just have to wait and see if the algorithm is actually any better which is what im doubting.
              • Yes, and patented, so that we can avoid building on it in the FOSS world and wait twenty odd years to be able to make use of this research.

      • by vux984 (928602)

        Do me a favor and look at the previous record holder in close detail and tell me that microsoft actually did anything other than buy the record...

        Microsoft actually did something other than buy the record.

        im just disgusted at microsoft buying the top then everyone is like wow they are doing something when in realty they are doing very little.

        Sorting at that scale is fundamentally an i/o bound problem; and distributed sorting is bound by communications between the nodes. Scaling the problem up to more comput

        • by jeff8j (2646163)
          "Microsoft actually did something other than buy the record." What did they do? Sorting on many disks does become harder but hey they had less disks than the last record holder so they actually had it easier If they invented interconnects then they broke the rules of the competition because its supposed to be 100% off the shelf hardware so its all about the algorithm Thanks for the link that the post points to and the first thing I read since yes I read the article and the blog post and the pdfs
          • by vux984 (928602)

            Sorting on many disks does become harder but hey they had less disks than the last record holder so they actually had it easie

            They sorted 3 times as much data. Hard drives didn't get 3x faster since 2009. And how many hard drives were involved is nearly irrelevant.

            If they invented interconnects then they broke the rules of the competition because its supposed to be 100% off the shelf hardware so its all about the algorithm

            They used off the shelf hardware to build the network, but they connected and used it

  • Website: http://sortbenchmark.org/ [sortbenchmark.org]
    PDF: MinuteSort with Flat Datacenter Storage [sortbenchmark.org]

    The sorts were accomplished using a heterogeneous
    cluster consisting of 256 computers and 1,033 disks, di-
    vided broadly into two classes: storage nodes and com-
    pute nodes. Notably, no compute node in our system
    uses local storage for data
    ; we believe FDS is the first
    system with competitive sort performance that uses re-
    mote storage. Because files are all remote, our 1,470 GB
    runs actually transmitted 4.4 TB over the network in un-
    der a minute. No strong assumptions are made around
    key or record lengths; keys and records of other lengths
    can be handled with only a performance-neutral config-
    uration change.

    Summary
    FDS is a general-purpose scalable parallel blob store
    that exploits a full-bandwidth interconnect to expose the
    entire cluster’s disk bandwidth to remote clients. The
    sort performance results in this paper demonstrate the
    power of the architecture: in both Daytona and Indy
    sorts, the system reads the data remotely to the sort ma-
    chines, sorts the data across the network, and writes it
    remotely back to storage.
    Performant remote file access imparts a flexibility ab-
    sent in contemporary distributed storage systems. Be-
    yond sort, FDS supports a broad variety of scalable large-
    data applications. It does so without demanding that
    cluster nodes balance compute and disk performance;
    more importantly, it does so without demanding that ap-
    plications observe locality constraints.

  • Could someone knowledgeable comment on their "tract locator table" (or TLT) metadata system and it's possible relation to P2P protocols? If Bittorrent didn't focus on peer-speed as measured by reads and writes, couldn't it gain an advantage using this? TLT is expected to have consistent membership, but if it was updated once a minute (say), wouldn't that be enough to get the advantages without it taking to long to join a group?

  • I think it's unfair to say that they are the only company funding this sort of research. Plenty of research is done by other companies such as Intel, IBM and Google. Granted, since (as usual) it seems the real issue being debated here is whether Microsoft is evil or not. I'd have to say that the answer is a resounding No. I applaud this accomplishment. I still despise their products and general philosophy, but credit should be given where credit is due, and this deserves credit. I think this developme
  • It's rare that I seem to hear much about Microsoft does in the basic research areas.
  • They used 10 GigE with a very advanced set of switches that support OpenFlow so that they could get the full bisectional bandwidth. They could have use InfiniBand and probably done much better with FDR adapters capable of 56 gigabit per second. Even "old" IB adapters were faster. Most of the IB switches supported full bisectional bandwidth right out of the box. MS should look at the High Performance Computing world. They need to do handle large amounts of data with low latency.

Our informal mission is to improve the love life of operators worldwide. -- Peter Behrendt, president of Exabyte

Working...