Forgot your password?
typodupeerror
Graphics Programming Hardware

Sorting Algorithm Breaks Giga-Sort Barrier, With GPUs 187

Posted by timothy
from the quick-like-double-time dept.
An anonymous reader writes "Researchers at the University of Virginia have recently open sourced an algorithm capable of sorting at a rate of one billion (integer) keys per second using a GPU. Although GPUs are often assumed to be poorly suited for algorithms like sorting, their results are several times faster than the best known CPU-based sorting implementations."
This discussion has been archived. No new comments can be posted.

Sorting Algorithm Breaks Giga-Sort Barrier, With GPUs

Comments Filter:
  • Excel Charts (Score:3, Insightful)

    by Anonymous Coward on Sunday August 29, 2010 @10:30PM (#33412026)

    I find it very disappointing that a group of programmer/computer science types who even supply BibTeX to make it easier to reference their work, resort to screen-capturing an Excel chart to display their data.

  • Not a barrier (Score:5, Insightful)

    by Captain Segfault (686912) on Sunday August 29, 2010 @10:49PM (#33412084) Homepage Journal

    This isn't a "barrier" like the "sound barrier". There are no special difficulties that start around 1G/sec! It's just a threshold.

    Don't get me wrong -- I'm not saying this isn't impressive, but no "barrier" was broken here!

  • Um... (Score:5, Insightful)

    by Anonymous Coward on Sunday August 29, 2010 @10:50PM (#33412088)

    Algorithms aren't measured in "x per second"... only implementations are measured that way. The speed of an algorithm is described in big-O notation, such as O(n log n). The metric of "sorted keys per second" is largely useless, because it depends on the particular hardware setup.

  • Re:Not a barrier (Score:5, Insightful)

    by XanC (644172) on Sunday August 29, 2010 @10:51PM (#33412096)

    It's not a threshold! It's just a milestone.

  • Re:Excel Charts (Score:1, Insightful)

    by Anonymous Coward on Sunday August 29, 2010 @11:00PM (#33412134)

    Who cares what tool they use to display data, if they're getting their point across in an effective manner? Good lord...

  • Re:Not a barrier (Score:3, Insightful)

    by caerwyn (38056) on Sunday August 29, 2010 @11:20PM (#33412200)

    Actually, if you look at shockwave dynamics during the moment an object crosses from subsonic to supersonic velocity, it can very easily be considered much more of a barrier than 1gkeys/sec can.

  • Re:Excel Charts (Score:4, Insightful)

    by Anpheus (908711) on Sunday August 29, 2010 @11:21PM (#33412208)

    Maybe excel was just the right tool for the job? It's quick and easy to use, and to reformat the graphs.

    I know the Linux tools tend to be a little longer between tweaking, rendering and displaying, so a fast WYSIWIG tool works just fine.

  • Ugh. (Score:3, Insightful)

    by martin-boundary (547041) on Sunday August 29, 2010 @11:29PM (#33412226)
    Stupid HTML ate my <...

    The problem with big-oh notation is that the constant isn't explicit, so for any given n (pick as large as desired), it is possible that O(nlogn) < O(n) for some choice of constants. That's why ops per second is still a useful metric when comparing implementations on standardized hardware.

    As always, in theory there's no difference between theory and practice, but in practice there is...

  • by Anonymous Coward on Monday August 30, 2010 @12:04AM (#33412340)

    The typical sorting problem I've encountered in my career (various types of scientific, telecommunications and business software, though not games) involves an array of pointers to fixed length records that have a sort key (let's say an integer) at a predefined offset. Not an array of integers, nor an array of directly embedded small fixed length records which I'm guessing was used in TFA. The former situation requires random as well as stream access to memory, which would likely favor processing by the CPU in the motherboard of a typical $1000-$2000 PC.

  • Re:No (Score:4, Insightful)

    by black3d (1648913) on Monday August 30, 2010 @12:52AM (#33412492)

    Unfortunately I can't mod having already posted in this thread, but please allow me to /bow. This is the best explanation I've ever read anywhere for the differences. Even I knew the differences but couldn't have expressed it so finely. Bravo.

  • by Trepidity (597) <delirium-slashdot@h a c k i sh.org> on Monday August 30, 2010 @01:28AM (#33412618)

    Well, yeah, they're not claiming they invented radix sort. They're claiming that their GPU implementation of radix sort runs about 4x as fast as the CPU implementation you describe.

  • Re:Excel Charts (Score:5, Insightful)

    by pspahn (1175617) on Monday August 30, 2010 @02:29AM (#33412798)
    Actually, I find even more disappointing that a decent way to display datasets on a web page isn't standard yet. Why can't a nice one be embeddable with column sorts and robust methods for retrieving data? There are solutions, sure, but I have yet to find one that isn't unnecessarily complex or just plain ugly and difficult to use. But I guess it's just a matter of time, right?
  • by evilWurst (96042) on Monday August 30, 2010 @03:19AM (#33412948) Journal

    It's generally not size of RAM that breaks radix sort; it's the size of cache. Modern processors are highly reliant on cache, which means they're highly reliant on things in memory being in small tight chunks that fit in cache - because cache misses are expensive enough that if you thrash cache badly enough, you may end up running slower than if you hadn't had any cache at all.

    Good comparison sorts may start fragmented, but by their very nature each pass of the algorithm makes them less so. Radix sort is the other way around; it follows pointers (so more precious scarce cache in use already) that point in more and more fragmented patterns with every pass. That's why even though radix sort's average speed is theoretically faster than quicksort, quicksort still wins on real life hardware. And that's probably why radix sort wins on GPUs - the data fits in the card's dedicated memory, which is already optimized to be accessed in a much more parallel way than main memory.

  • Re:Um... (Score:1, Insightful)

    by Anonymous Coward on Monday August 30, 2010 @03:21AM (#33412952)

    The affect that extra CPUs will have is too dependent on the hardware implementation to be able to formalize like this.

    It's no more exotic than having several levels of cache between the CPU and RAM (or even treating RAM as a cache between CPU and network/spinning disk). Even O(N) algorithms curve when N overflows L1, L2, L3, and RAM.

  • Re:Excel Charts (Score:3, Insightful)

    by dominious (1077089) on Monday August 30, 2010 @06:54AM (#33413464)
    what. the. fuck. +4 Insightful because they use Excel charts?

    Hey I just solved N=NP.
    Yeah, but you are using Excel charts...hmmm sorry kthnx later.
  • Re:Excel Charts (Score:3, Insightful)

    by multipartmixed (163409) on Monday August 30, 2010 @10:40AM (#33414950) Homepage

    1. Why do you believe those are screen captures, rather than, say, exported images?

    2. How would the data look different it were displayed with BibTeX?

    3. How fast is using BibTeX? (I've never used it). I could create those same charts in Excel '97 from a CSV of input points easily; probably in under a minute.

Nature, to be commanded, must be obeyed. -- Francis Bacon

Working...