Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming Math

Is There a Sorting Algorithm Faster than Quicksort and Timsort? (github.com) 130

When asked for the most efficient way to sort a million 32-bit integers in 2008, then-presidential candidate Barack Obama answered, "I think the bubble sort would be the wrong way to go."

But people are still searching for the best possible sorting algorithms, explains Slashdot reader scandum: Long has the conviction been held that quicksort is faster than merge sort. Timsort (derived from merge sort and insertion sort) was introduced in 2002 and while slower than quicksort for random data, Timsort performs better on ordered data.

Quadsort (derived from merge sort) was introduced in 2020 and is faster than quicksort for random data, and slightly faster than Timsort on ordered data.

Also of notice is the significant performance difference on small arrays, quadsort is on average two times faster than Timsort on data sets between 10 and 1000 elements. Quadsort achieves this performance through several optimizations spread out over 1500 lines of code that get the maximum performance out of merge sort.

Quadsort's GitHub page explains: After the first round of sorting a single if check determines if the four swap variables are sorted in order, if that's the case the swap finishes up immediately. Next it checks if the swap variables are sorted in reverse-order, if that's the case the sort finishes up immediately. If both checks fail...two checks remain to determine the final order.
This discussion has been archived. No new comments can be posted.

Is There a Sorting Algorithm Faster than Quicksort and Timsort?

Comments Filter:
  • by jeromef ( 2726837 ) on Saturday July 25, 2020 @11:54AM (#60330193)
    Now that's an article for nerds... at last! :)
    • Agree, and bonus!, it brings back memories to a 74-year-old nerd. :)

      I got a TRS-80 back in Feb 1978 and Radio Shack had no idea what the goddam thing would do.

      It would bubble sort and I really, really needed to sort some stuff.

      I worked so hard on that problem that I think I can still sit down and write it up from memory using pen and paper.

    • by goombah99 ( 560566 ) on Saturday July 25, 2020 @06:17PM (#60331507)

      The question is not defined until you define how you measure quickest. THat is, do you want something that is on average quicker, but has worse Worst case performance? Do you want something that is the fastest for best case performance on say (25%) of data sets.

      And then do you want something even faster when you can make some assumptions about the data? E.g. if numberical is it nornally distributed? can you represent it as a small integer and thus do a radix sort?
      Can I make a copy of the data or do I have to do it in-place.

      Can I do it in parallel or on a single thread? and how to we measure speed there? wall clock?

      Finally there's the question of size of your data. if you only have two elements then
      if A > B : print A,B
      else: print B,A

      is faster than quick sort. And indeed even quick sort doesn't use quick sort when it gets down to 5 elements.

      • by AuMatar ( 183847 )

        Don't forget extra memory. If you have memory to kill, radix sorts and various bucket sorts can give linear time. If your data is nearly sorted already, insertion sort can get linear time without extra memory

        • by tlhIngan ( 30335 )

          Don't forget extra memory. If you have memory to kill, radix sorts and various bucket sorts can give linear time. If your data is nearly sorted already, insertion sort can get linear time without extra memory

          Which is why all the sorting algorithms are still used and taught. Heck, for very small datasets, bubblesort is perfectly adequate if you need to sort in a pinch.

          Yes, if you have to sort a very small dataset (say, 100 items or less), it might just be easier to implement a bubblesort, and the difference

      • To clarify one of your points, you need to know the nature of your data. Is it random, mostly sorted, etc. The fastest algorithm may be a function of the nature of your typical data. Knuth (vol 2 sorting and searching) has a section on this. Listing a bunch of algorithms and what types of data they prefer and hate.
      • You also need to define what system it is running on. In a distributed (map/reduce, Hadoop style) system, merge sort is more natural, and can be quicker than quick sort.
  • What? (Score:5, Informative)

    by JBMcB ( 73720 ) on Saturday July 25, 2020 @11:55AM (#60330199)

    I thought this was a nerd site. There is no *best* overall sort algorithm. There are algorithms that work better given certain resource constraints or source data. That's why there are a whole bunch of algorithms that do almost the same thing.

    Reference: The Art of Computer Programming Volume 3

    • Re:What? (Score:5, Funny)

      by Sique ( 173459 ) on Saturday July 25, 2020 @12:06PM (#60330261) Homepage
      So I present to you the do-nothing-sort. It's O(1). It just does nothing and hopes that the list is already sorted, which works 1 out of n!. So while the algorithm may fail, when it does not, it's the fastest of them all.
      • Ah, sort of like Facebook's filters then.

      • Re:What? (Score:5, Funny)

        by Synonymous Cowered ( 6159202 ) on Saturday July 25, 2020 @12:21PM (#60330355)

        So I present to you the do-nothing-sort. It's O(1). It just does nothing and hopes that the list is already sorted, which works 1 out of n!. So while the algorithm may fail, when it does not, it's the fastest of them all.

        WTF? Do-nothing-sort???? Nobody is going to use that. Come on, marketing!!!!! You need better branding: LottoSort. It may not be able to lure in the better programmers who understand statistics, but the programming field these days has grown way beyond that group. I bet you'll take the industry by storm.

      • That's similar to a sorting algorithm a GAN came up with, but the GAN's implementation is even more efficient.

        GAN Sort: Delete the entire array. As there are no entries, it is by definition sorted.

      • It’s a Monte Carlo method for sorting...
      • So I present to you the do-nothing-sort. It's O(1). It just does nothing and hopes that the list is already sorted, which works 1 out of n!. So while the algorithm may fail, when it does not, it's the fastest of them all.

        I prefer Trumpsort, where you redefine what "sorted" means then go on TV and take credit for everything in the world suddenly being in a ordered state.

    • Many particular sets of data have some order to them which allows the selection of a better algorithm. An example is data where entries are deleted, and new random value entries are appended. These are very common in databases tables that would benefit from occasional optimization via sorting. A slight amount of extra information about the data can thus provide very useful optimizations for ordering it.

    • Yes, this is a nerd site. And I thought nerds like to read....you know, things like the summary that say "while slower than quicksort for random data, Timsort performs better on ordered data.", " faster than quicksort for random data, and slightly faster than Timsort on ordered data", and "performance difference on small arrays". Did you really not read that, or did you expect them to list out every fucking possible situation (CPU X takes an extra clock cycle on the compare, so algorithm A will work better

    • Reference: The Art of Computer Programming Volume 3

      That's because you haven't made it to Volume 4 yet, where you'll learn to generate all n-tuples, and then you can just generate the set that is ordered the way you want. Why sort anything?

      In fact, most modern algorithms used in actual programs don't sort the data at all, even where you're calling methods that purport to do exactly that, and instead they generate new data with a desired order. The input data is not sorted in this case, it is discarded.

      • generate all n-tuples,

        With sufficient parallelism, all algorithms are O(1).

      • Reference: The Art of Computer Programming Volume 3

        That's because you haven't made it to Volume 4 yet, where you'll learn to generate all n-tuples, and then you can just generate the set that is ordered the way you want. Why sort anything?

        In fact, most modern algorithms used in actual programs don't sort the data at all, even where you're calling methods that purport to do exactly that, and instead they generate new data with a desired order. The input data is not sorted in this case, it is discarded.

        Or if it's in a database, the data remains in it's original order and the sort algorithm is used to create an index. Multiple indexes are typically used because you'll likely want the data sorted in different ways for different applications.

    • Did someone develop already a lower bound on the number of operations necessary to sort the data given the entropy of the data set?

      • Did someone develop already a lower bound on the number of operations necessary to sort the data given the entropy of the data set?

        What kind of entropy? Min, Shannon, max, Hartley or any other value of alpha for Renye entropy?

    • Back in 82 we found a way to optimize a bubble sort to beat all the other sorting methods.
      Yes, with a bubble sort !
      I won't go over the process we used, because it's likely very outdated now, but we stumbled on a rare situation where the bubble sort was clearly faster, and then developed an optimization that forced that situation. Even with the overhead required to do that, it beat the pants off the other sorts.

      Top dog in the sorting world will likely remain in flux as long as people keep fooling around with
      • by narcc ( 412956 )

        Bubble sort is really good in a lot of common situations. It's fantastic for short, almost sorted lists, for example. It's simple, very low overhead, it sorts in-place, and it's stable.

        I know it's the cool thing to say that it's terrible and should never be used, but that's lazy thinking. Some times, it really is the best choice.

        • It's fantastic for short, almost sorted lists, for example. It's simple, very low overhead, it sorts in-place, and it's stable.

          Both these are true of selection-sort and insertion-sort, but bubble-sort has a larger constant factor (is slower) than either of those sorts.

          The only reason to use bubble sort is if you don't have a sorting library available, and need something with very few lines of code.

          It's probably worth mentioning that bubble sort is always faster than quick-sort for ~9 elements or less, but selection-sort and merge-sort are even faster.

    • Your comment is valid that different sorts work might work best for different data types or structures. But even for any given sort scenario or sort method, there is the endless quest to do better. If a sort can be made 10% faster, that is better. It is not a matter of "method X is better than method Y because both are good depending on the task". Instead, it is a matter of "can we find ever faster sort methods". It is no different than finding an ever denser image compression method, or a faster raytr

      • No we mustn't. The lower bound on decision tree sorting is known to be n lg n, and we're already there. The only further optimizations possible are those that depend on having a priori knowledge of the data, or the system, or both. There are no 10% improvements to be had, or even n% where n > 0.

        If you want to improve performance in the general case, the only option is to use faster hardware, or else invent time travel and bring your results back from a future where the operation is complete. While yo

    • by Jeremi ( 14640 )

      There is no *best* overall sort algorithm.

      ... that we know of. Unless it's been proven somehow that a universally optimal sort algorithm cannot exist, there's always the possibility that there is one out there, just waiting to be discovered/invented.

  • by Frobnicator ( 565869 ) on Saturday July 25, 2020 @11:56AM (#60330201) Journal

    Special cases of mostly sorted data, cases of integers only, cases using CPU cache, cases leveraging a specific CPU's SIMD instructions...

    It is neat that sometimes there are special cases that apply to the data that allows a different routine to work better. Good on researchers for finding them. But for most of us it doesn't matter.

    Most developers will call the system's sort routine and get on with life. I

    f you are in a special case like the ones described - - good for you. I hope you enjoy the extra nanoseconds.

    • My special cases are almost never like this. I'm either constrained by RAM, or constrained by timing, and I never want the "best" algorithm for my particular data. I always want the conceptually simplest algorithm that is low level, easily portable, and fits inside whichever constraint was controlling.

      Even when I was a database consultant it was rare that the algorithm that was "best" based on the particular data was also most desirable from a maintenance perspective. Especially because there is always some

  • by rsilvergun ( 571051 ) on Saturday July 25, 2020 @12:01PM (#60330229)
    I thought it was magic. I mean, I understood how it worked because it was explained to me, but I couldn't comprehend how someone would come up with it in the first place.

    Parity & PAR Archives are like that. Again, I've read books on how they work, but it's crazy to think that somebody thought it should even be possible to do something like that :).
  • Check out Radix Sort (Score:5, Informative)

    by jklappenbach ( 824031 ) on Saturday July 25, 2020 @12:03PM (#60330241) Journal
    Radix Sort [wikipedia.org] is one option, though this Stack Overflow [stackoverflow.com] article has information on why it's not as popular as Quick.
    • by mark-t ( 151149 )
      There are ways around mitigating the memory impact of Radix sort for large ranges that make it more manageable, but it introduces a lot complexity that of course carries its own performance impact. It might still be worth it for very large input data sets if you can live with very complicated code, however.
    • It's interesting that only the last answer hints at the actual reason why comparison-based sorts are more common than radix-based sorts: "library" sort routines have to work with user-defined data types, and it's much more convenient for the user to write a comparison operation than a radix query.

  • by raynet ( 51803 ) on Saturday July 25, 2020 @12:06PM (#60330259) Homepage

    Simple, allocate 12GB of RAM, go through the source array and increment 3 byte value at that integers location, read through the allocated memory to output sorted list, ignore zeroes.

  • by Rick Zeman ( 15628 ) on Saturday July 25, 2020 @12:07PM (#60330263)

    When asked for the most efficient way to sort a million 32-bit integers in 2008, then-presidential candidate Barack Obama answered, "I think the bubble sort would be the wrong way to go."

    Donald Trump: "Huh?"

    • by sarren1901 ( 5415506 ) on Saturday July 25, 2020 @12:12PM (#60330309)

      I wanted to mod you up, but there wasn't a "Sad" option.

    • Just scramble them until they're sorted on their own.

    • Just ignore it and it will sort itself eventually.
    • Now if only presidency was about computer science.

      Nerds would be the kings of presidents.

    • by PPH ( 736903 )

      Delete the objects that are out of order.

      • by hAckz0r ( 989977 )

        Delete the objects that are out of order.

        You mean like randomly firing people until the job gets done right?

    • The Donald will make sorting great again.

      MSGA!

      It will be the best algorithm. Better than the "Chinese" algorithm and it will replace the fake Democrat algorithms.

    • by tsqr ( 808554 )

      When asked for the most efficient way to sort a million 32-bit integers in 2008, then-presidential candidate Barack Obama answered, "I think the bubble sort would be the wrong way to go."

      Donald Trump: "Huh?"

      And both answers are equally useful.

      • by thegarbz ( 1787294 ) on Saturday July 25, 2020 @02:23PM (#60330853)

        Yes but one demonstrates the ability to know what the hell a sorting algorithm even is.

        • Also "don't use bubble sort" is already 80% of the way to a decent solution.

        • by tsqr ( 808554 )
          Actually what it demonstrates is the ability to recall something he heard somewhere sometime, which is a useful skill for a politician, but has nothing to do with any sort of computer programming knowledge. Do yoy really believe that he has any grasp of sorting algorithms? SMH.
          • by Jeremi ( 14640 )

            Do yoy really believe that he has any grasp of sorting algorithms? SMH.

            Unless he took an algorithms course in college, he probably doesn't. But if you sat him down and explained the basics to him, I'm pretty sure he'd be able to grasp the ideas without too much difficulty.

          • Actually what it demonstrates is the ability to recall something he heard somewhere sometime

            Congratulations, you described knowledge.

    • No presidential candidate would know how to answer this question unless, by some unlikely chance, they happened to have studied computer science in university.

      Obama was only able to answer the question because Google staff had prepped him for it ahead of time [quora.com].

      • No presidential candidate would know how to answer this question unless, by some unlikely chance, they happened to have studied computer science in university.

        Obama was only able to answer the question because Google staff had prepped him for it ahead of time [quora.com].

        Good citation. Trump would still make a word salad out of the answer, though. If he remembered it.

        • Mostly because he wouldn't have gotten the Google prep and would have answered extemporaneously. Anyone can sound knowledgeable reading from a script. Well, not Biden, but you know what I mean.
    • by Sebby ( 238625 )

      When asked for the most efficient way to sort a million 32-bit integers in 2008, then-presidential candidate Barack Obama answered, "I think the bubble sort would be the wrong way to go."

      Donald Trump: "Huh?"

      Nah! Trump would say:

      "Nobody knows computer sorting better than me!"

  • by mark-t ( 151149 )

    In theory, you can't get any faster than Radix sort.

    Although a naive implementation can have a large constant factor.

    The complexity of making an implementation that makes the constant factor manageable for anything but input data that covers a small range typically outweighs performance benefits.

    • In theory, you can't get any faster than Radix sort.

      That has got to be tied (with many others) for the stupidest "theory" possible.

      If you're wondering why, you could restrict yourself entirely to pre-Socratic writers in Ancient Greece and you'd still find it was already analyzed to death.

  • ... is that it doesn't adapt to data that is already nearly sorted.

    And, as a worst case, the most straightforward way to implement it can decay performance to O(n^2). Modifying it to ensure that it is n(log n) as a worst case is just as complex as implementing another guaranteed O(n log n) sort such as heapsort.

    Another possibility to consider might be Smoothsort [wikipedia.org], which adaptively varies between O(n) and O(n log n), depending on how sorted the input data already is.

    Smoothsort is complex, however... but depending on the kind of data you have to sort, it may be suitable.

    • I did an implementation of Smoothsort back in 1990, it wound up replacing the Quicksort algorithm that was being used because in testing our use cases it was better on most metrics and the same on the rest. But it is complex.
      • by mark-t ( 151149 )
        I would argue that the only real problem with smoothsort is that it is simply too complex for one to easily remember how it works.
    • I'm trying to remember if quick sort also had a problem in that it was unstable and heapsort wasn't. By unstable what I mean is if you have say a bunch of e-mails, first sort them by date and then sort them by sender a stable sorting algorithm would have them grouped together for a given sender and in that group they'd still be sorted by date. An unstable algorithm they'd be in groups for a given sender but in that group they'd no longer be sorted by date.
      • Both quicksort and heapsort are unstable. I forget why heapsort is, though, but I recall hearing most tree algorithms do not guarantee the order in which the branches are processed.
    • Your smoothsort link to wikipedia said it is not stable. While the quadsort posted in github claimed to be stable. If we don't mind code to be complex, I would rather strive for a sort that is stable which has a boarder use. The only problem of the github post is, without 3rd party verification, we don't know if it is truly faster than algorithms by others
  • by Liquid-Gecka ( 319494 ) on Saturday July 25, 2020 @12:48PM (#60330445)

    Yes.. Bubble Sort. Okay, now that your done laughing let me explain. All of the sort discussions assume that your on a single instruction at a time machine. Comparisons can only be done 1 at a time. As such you start to get into typical sorting times of nlog(n). However, when you get to a computing framework that allows massive parallel evaluations it changes the framework completely. In a distributed system, where each node has its own CPU you can actually actually compare neighbors in parallel and then transfer.

    Assuming a 6 node cluster starting with the following data at t0 [6, 2, 4, 3, 1, 5] we can sort it in O(n) steps:
    t1: swap 6 to 2, swap 4 to 3, leave 1 and 5. results: [2, 6, 3, 4, 1, 5]
    t2: swap 6 to 3, swap 4 to 1, results: [2, 3, 6, 1, 4, 5]
    t3: leave 2 and 3, swap 6 to 1, leave 4 and 5, results: [2, 3, 1, 6, 4, 5]
    t4: swap 3 to 1, swap 6 to 4, results: [2, 1, 3, 4, 6, 5]
    t5: swap 1 to 2, leave 3 and 4, swap 5 to 6, results: [1, 2, 3, 4, 5, 6]

    Worst case is that an object has to "bubble" once across the system, which is O(n) =)

    • by bidule ( 173941 )

      I was wondering how it handled a reversed list, as it would swap 4 with both 5 and 3. Then I realized you alternate 2 processes: bubble odd pairs then bubble even.

    • Do CPUs in any kind of distributed system know who their "neighbor" is?
  • Here is a fun video on sorting speed https://www.youtube.com/watch?... [youtube.com]

  • They should focus their efforts on some kind of SIMD super instruction that will handle multiple integers in one go and output some kind of metadata that would be helpful in the sort process.

    I'd love to see what might be achieved by kind of software-definable SIMD processor like an FPGA but that can fully take over from the CPU and access main memory to do some task then hand the bus back to the CPU.

    • by hAckz0r ( 989977 )

      Here is an idea. Start with the highest order bit of every number, working down to the lowest bit, and have it calculate a vector order array of the relative displacement of each value from its sorted position, and then the final matrix operation would be to move each number simultaneously using that calculated offset into their new sorted position. Aka, a SMID differential sort algorithm.

      You would need a processor at least three times (potentially much more) the size of the dataset, and some fancy kind of

  • by LostMyBeaver ( 1226054 ) on Saturday July 25, 2020 @01:57PM (#60330743)
    Algorithmically, quicksort generally reigns supreme. The issue however is that CPUs, especially for distributed cores can behave much differently. These algorithms are typically measures on single thread operation.

    You would find that an altered quicksort and merge sort can be much faster. Consider breaking a numerical array into symmetric chunks and quicksorting. While quicksorting each chunk on a separate core, maintain a mean value as an atomic value shared across threads. Then also store the position of the mean between each core. When all cores complete, merge the values above and below the mean so the top half of the sorted chunks are all contiguous and then below... then run a quicksort where chunks are symmetrical above and below the mean. Repeat until sorted. This is just an oversimplified distributes algorithm.

    Now consider the same problem but account for cache sizes to eliminate cache misses. This can be done with ensuring chunks are all aligned and fit within L1 cache.

    If you are trying to be fast; you need to consider more than just algorithmic weaknesses. Also consider architectural issues. Also consider that if memory is plentiful enough, real magic could happen by not working in place.

    Then there is the issue that integers are not always the issue. Typically, you sort records, not numbers. As such, you can find indexing and then in place sorting would be faster.

    So, while most real education occurs from Knuth, and it should, you need to also consider distributed algorithms which Knuth said he would leave to others.
  • by RobinH ( 124750 ) on Saturday July 25, 2020 @02:31PM (#60330885) Homepage
    In my code I just do: products.OrderBy(x => x.PartNumber).ToList(); Works great! :)
    • That is exactly the kind of response I would expect from a C# programmer. There is no sub-community of the software world that glorifies ignorance like C# programmers do.
  • You want heapsort b/c it guarantees fastest completion in the worst case. Quicksort sucks. I've seen it run for days on mainframes, until they finally killed the job, switched to heapsort, fired the job up again and completed in 4 hours.
    • Agreed. Heapsort may not be the fastest in all cases but it is much faster than quicksort in the worst case. I use a heapsort algorithm whenever I am faced with a lot of random items to sort.

  • by CaptainDork ( 3678879 ) on Saturday July 25, 2020 @02:57PM (#60330973)

    ... sorted affair.

  • Is everyone high or is it just me? Radix sort is O(1), what are you smoking?
  • by PPH ( 736903 ) on Saturday July 25, 2020 @04:32PM (#60331277)

    That's where my wife picks through the closets and stacks all my stuff that she doesn't think I need out by the garbage can. And I return it to a secure location as fast as possible.

    • I use Paged Virtual Memory on my wife that way when she opens the cupboard she only sees her stuff and can't touch mine. The downside here is that when taking cloths off the line she always leaves behind the towels and bed sheets as they aren't loaded into her page table.

  • An olde school /. thread worth reading through! Very impressed by the minimal off topic flame warring even with a trigger proper noun in the summary.

  • I have, in fact, a very elegant O(0) sorting algorithm for ordered data!

  • For a sort with assured time and space need, use bottom-up heapsort or mergesort. Anybody using Quicksort should be banned from ever writing commercial code again.

  • The original quicksort is not very useful, as it chokes badly when there's a large number of identical values in your list. When people talk about quicksort, they usually mean one of its variations, such as the dual-pivot quicksort. Most sorts will slow down with certain datasets, but being recursive, quicksort can actually crash without some sanity limits in place.

    A long time ago I wrote a program called Picsort on my Amiga, which sorted a bitmap in video memory, so you could see the program working. St

  • Sorting by comparison is the wrong way to sort data.

    In the example of one million numbers, simply go through them and place them into the appropriate position in a million entry array.

    For other data, for example text, simply use the first 8/16/32-bit fragment to of the text to find the appropriate array entry, then repeat for next fragment, using a 2nd level array etc.

    Any electronic data can be reduced to an index, and thus that index can be used to put the data into an array.

God help those who do not help themselves. -- Wilson Mizner

Working...