Is There a Sorting Algorithm Faster than Quicksort and Timsort? (github.com) 130
When asked for the most efficient way to sort a million 32-bit integers in 2008, then-presidential candidate Barack Obama answered, "I think the bubble sort would be the wrong way to go."
But people are still searching for the best possible sorting algorithms, explains Slashdot reader scandum: Long has the conviction been held that quicksort is faster than merge sort. Timsort (derived from merge sort and insertion sort) was introduced in 2002 and while slower than quicksort for random data, Timsort performs better on ordered data.
Quadsort (derived from merge sort) was introduced in 2020 and is faster than quicksort for random data, and slightly faster than Timsort on ordered data.
Also of notice is the significant performance difference on small arrays, quadsort is on average two times faster than Timsort on data sets between 10 and 1000 elements. Quadsort achieves this performance through several optimizations spread out over 1500 lines of code that get the maximum performance out of merge sort.
Quadsort's GitHub page explains: After the first round of sorting a single if check determines if the four swap variables are sorted in order, if that's the case the swap finishes up immediately. Next it checks if the swap variables are sorted in reverse-order, if that's the case the sort finishes up immediately. If both checks fail...two checks remain to determine the final order.
But people are still searching for the best possible sorting algorithms, explains Slashdot reader scandum: Long has the conviction been held that quicksort is faster than merge sort. Timsort (derived from merge sort and insertion sort) was introduced in 2002 and while slower than quicksort for random data, Timsort performs better on ordered data.
Quadsort (derived from merge sort) was introduced in 2020 and is faster than quicksort for random data, and slightly faster than Timsort on ordered data.
Also of notice is the significant performance difference on small arrays, quadsort is on average two times faster than Timsort on data sets between 10 and 1000 elements. Quadsort achieves this performance through several optimizations spread out over 1500 lines of code that get the maximum performance out of merge sort.
Quadsort's GitHub page explains: After the first round of sorting a single if check determines if the four swap variables are sorted in order, if that's the case the swap finishes up immediately. Next it checks if the swap variables are sorted in reverse-order, if that's the case the sort finishes up immediately. If both checks fail...two checks remain to determine the final order.
News for nerds. Stuff that matters. (Score:5, Insightful)
Re: (Score:2)
Agree, and bonus!, it brings back memories to a 74-year-old nerd. :)
I got a TRS-80 back in Feb 1978 and Radio Shack had no idea what the goddam thing would do.
It would bubble sort and I really, really needed to sort some stuff.
I worked so hard on that problem that I think I can still sit down and write it up from memory using pen and paper.
It's the wrong question (Score:5, Informative)
The question is not defined until you define how you measure quickest. THat is, do you want something that is on average quicker, but has worse Worst case performance? Do you want something that is the fastest for best case performance on say (25%) of data sets.
And then do you want something even faster when you can make some assumptions about the data? E.g. if numberical is it nornally distributed? can you represent it as a small integer and thus do a radix sort?
Can I make a copy of the data or do I have to do it in-place.
Can I do it in parallel or on a single thread? and how to we measure speed there? wall clock?
Finally there's the question of size of your data. if you only have two elements then
if A > B : print A,B
else: print B,A
is faster than quick sort. And indeed even quick sort doesn't use quick sort when it gets down to 5 elements.
Re: (Score:2)
Don't forget extra memory. If you have memory to kill, radix sorts and various bucket sorts can give linear time. If your data is nearly sorted already, insertion sort can get linear time without extra memory
Re: (Score:3)
Which is why all the sorting algorithms are still used and taught. Heck, for very small datasets, bubblesort is perfectly adequate if you need to sort in a pinch.
Yes, if you have to sort a very small dataset (say, 100 items or less), it might just be easier to implement a bubblesort, and the difference
You need to know the nature of your data ... (Score:2)
Re: (Score:2)
What? (Score:5, Informative)
I thought this was a nerd site. There is no *best* overall sort algorithm. There are algorithms that work better given certain resource constraints or source data. That's why there are a whole bunch of algorithms that do almost the same thing.
Reference: The Art of Computer Programming Volume 3
Re:What? (Score:5, Funny)
Re: (Score:3)
Ah, sort of like Facebook's filters then.
Re:What? (Score:5, Funny)
So I present to you the do-nothing-sort. It's O(1). It just does nothing and hopes that the list is already sorted, which works 1 out of n!. So while the algorithm may fail, when it does not, it's the fastest of them all.
WTF? Do-nothing-sort???? Nobody is going to use that. Come on, marketing!!!!! You need better branding: LottoSort. It may not be able to lure in the better programmers who understand statistics, but the programming field these days has grown way beyond that group. I bet you'll take the industry by storm.
Re: (Score:2)
LottoSort? That's infringing on the Bogosort trademark!
Re: (Score:2)
Re: (Score:2)
Insta-sort
Re:What? (Score:4, Funny)
That's called UXSort. You present the data in whatever order you want to, and call it part of the user experience.
Re: (Score:2)
Or if you are the back end guy you just send the data in whatever order and let the front end person do whatever to filter/sort on their end.
Re: (Score:2)
Come on, marketing!!!!! You need better branding
So, in other words: Machine Learning Data Science Distributed Blockchain Sort.
Re: (Score:3)
That's similar to a sorting algorithm a GAN came up with, but the GAN's implementation is even more efficient.
GAN Sort: Delete the entire array. As there are no entries, it is by definition sorted.
Re: (Score:2)
Re: (Score:2)
So I present to you the do-nothing-sort. It's O(1). It just does nothing and hopes that the list is already sorted, which works 1 out of n!. So while the algorithm may fail, when it does not, it's the fastest of them all.
I prefer Trumpsort, where you redefine what "sorted" means then go on TV and take credit for everything in the world suddenly being in a ordered state.
Re: (Score:2)
Many particular sets of data have some order to them which allows the selection of a better algorithm. An example is data where entries are deleted, and new random value entries are appended. These are very common in databases tables that would benefit from occasional optimization via sorting. A slight amount of extra information about the data can thus provide very useful optimizations for ordering it.
Re: (Score:2)
Yes, this is a nerd site. And I thought nerds like to read....you know, things like the summary that say "while slower than quicksort for random data, Timsort performs better on ordered data.", " faster than quicksort for random data, and slightly faster than Timsort on ordered data", and "performance difference on small arrays". Did you really not read that, or did you expect them to list out every fucking possible situation (CPU X takes an extra clock cycle on the compare, so algorithm A will work better
Re: (Score:2)
Reference: The Art of Computer Programming Volume 3
That's because you haven't made it to Volume 4 yet, where you'll learn to generate all n-tuples, and then you can just generate the set that is ordered the way you want. Why sort anything?
In fact, most modern algorithms used in actual programs don't sort the data at all, even where you're calling methods that purport to do exactly that, and instead they generate new data with a desired order. The input data is not sorted in this case, it is discarded.
Re: (Score:3)
generate all n-tuples,
With sufficient parallelism, all algorithms are O(1).
Re: (Score:2)
And often with less transistors than are in a modern CPU.
Re: (Score:2)
generate all n-tuples,
With sufficient parallelism, all algorithms are O(1).
Like the spaghetti sort
Re: (Score:2)
Reference: The Art of Computer Programming Volume 3
That's because you haven't made it to Volume 4 yet, where you'll learn to generate all n-tuples, and then you can just generate the set that is ordered the way you want. Why sort anything?
In fact, most modern algorithms used in actual programs don't sort the data at all, even where you're calling methods that purport to do exactly that, and instead they generate new data with a desired order. The input data is not sorted in this case, it is discarded.
Or if it's in a database, the data remains in it's original order and the sort algorithm is used to create an index. Multiple indexes are typically used because you'll likely want the data sorted in different ways for different applications.
Re: (Score:2)
Did someone develop already a lower bound on the number of operations necessary to sort the data given the entropy of the data set?
Re: (Score:2)
Did someone develop already a lower bound on the number of operations necessary to sort the data given the entropy of the data set?
What kind of entropy? Min, Shannon, max, Hartley or any other value of alpha for Renye entropy?
Re: (Score:2)
Yes, with a bubble sort !
I won't go over the process we used, because it's likely very outdated now, but we stumbled on a rare situation where the bubble sort was clearly faster, and then developed an optimization that forced that situation. Even with the overhead required to do that, it beat the pants off the other sorts.
Top dog in the sorting world will likely remain in flux as long as people keep fooling around with
Re: (Score:3)
Bubble sort is really good in a lot of common situations. It's fantastic for short, almost sorted lists, for example. It's simple, very low overhead, it sorts in-place, and it's stable.
I know it's the cool thing to say that it's terrible and should never be used, but that's lazy thinking. Some times, it really is the best choice.
Re: (Score:2)
It's fantastic for short, almost sorted lists, for example. It's simple, very low overhead, it sorts in-place, and it's stable.
Both these are true of selection-sort and insertion-sort, but bubble-sort has a larger constant factor (is slower) than either of those sorts.
The only reason to use bubble sort is if you don't have a sorting library available, and need something with very few lines of code.
It's probably worth mentioning that bubble sort is always faster than quick-sort for ~9 elements or less, but selection-sort and merge-sort are even faster.
Re: (Score:2)
For a sufficiently-small definition of "some" that approaches zero, sure. :P
Approaches, but does not reach.
A friend of mine, who worked at Microsoft, told me gleefully that he had managed to usefully use bubble sort. It was some code related to speakers, and it was sorting the speaker numbers or something like that. It is rare for a Windows system to have more than 8 speakers connected. The most ridiculous over-the-top speakers system would be 24 speakers [wikipedia.org].
Since n was guaranteed small, the tiny bit of cod
Re: (Score:2)
Your comment is valid that different sorts work might work best for different data types or structures. But even for any given sort scenario or sort method, there is the endless quest to do better. If a sort can be made 10% faster, that is better. It is not a matter of "method X is better than method Y because both are good depending on the task". Instead, it is a matter of "can we find ever faster sort methods". It is no different than finding an ever denser image compression method, or a faster raytr
Re: (Score:3)
No we mustn't. The lower bound on decision tree sorting is known to be n lg n, and we're already there. The only further optimizations possible are those that depend on having a priori knowledge of the data, or the system, or both. There are no 10% improvements to be had, or even n% where n > 0.
If you want to improve performance in the general case, the only option is to use faster hardware, or else invent time travel and bring your results back from a future where the operation is complete. While yo
Re: (Score:2)
There is no *best* overall sort algorithm.
... that we know of. Unless it's been proven somehow that a universally optimal sort algorithm cannot exist, there's always the possibility that there is one out there, just waiting to be discovered/invented.
There is always another special case (Score:3, Insightful)
Special cases of mostly sorted data, cases of integers only, cases using CPU cache, cases leveraging a specific CPU's SIMD instructions...
It is neat that sometimes there are special cases that apply to the data that allows a different routine to work better. Good on researchers for finding them. But for most of us it doesn't matter.
Most developers will call the system's sort routine and get on with life. I
f you are in a special case like the ones described - - good for you. I hope you enjoy the extra nanoseconds.
Re: (Score:3)
My special cases are almost never like this. I'm either constrained by RAM, or constrained by timing, and I never want the "best" algorithm for my particular data. I always want the conceptually simplest algorithm that is low level, easily portable, and fits inside whichever constraint was controlling.
Even when I was a database consultant it was rare that the algorithm that was "best" based on the particular data was also most desirable from a maintenance perspective. Especially because there is always some
When I learned quicksort in college (Score:4, Interesting)
Parity & PAR Archives are like that. Again, I've read books on how they work, but it's crazy to think that somebody thought it should even be possible to do something like that
Re:When I learned quicksort in college (Score:4, Insightful)
I could more or less comprehend how someone could invent quicksort because my teacher explained how they did it at the time.
But what really boggled up my brain was B-Tree. THAT was true sorcery.
if you want to see true sorcery (Score:3)
I suggest quantum field theory.
Dirac pretty much popped a completely crazy idea out of the vacuum and there it was.
That was 2 years after the first dissertation ever on quantum mechanics, his own, which derived QM through Poisson brackets that is the modern used form---not Heisenberg and Schroedinger's.
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
Wow, very courageous of you to tell that story, especially around here!
This either means there is hope for you, or it means there isn't any. Time will tell.
Re: When I learned quicksort in college (Score:5, Interesting)
I always felt this towards Burrows Wheeler Transform, it's a very unintuitive algorithm, the idea that to achieve compression you should devise a sort that you can reverse.
Check out Radix Sort (Score:5, Informative)
Re: (Score:2)
Re: (Score:2)
It's interesting that only the last answer hints at the actual reason why comparison-based sorts are more common than radix-based sorts: "library" sort routines have to work with user-defined data types, and it's much more convenient for the user to write a comparison operation than a radix query.
Just use ram for sorting (Score:3)
Simple, allocate 12GB of RAM, go through the source array and increment 3 byte value at that integers location, read through the allocated memory to output sorted list, ignore zeroes.
Re:Just use ram for sorting (Score:4, Interesting)
Donald Trump would say.... (Score:4, Funny)
When asked for the most efficient way to sort a million 32-bit integers in 2008, then-presidential candidate Barack Obama answered, "I think the bubble sort would be the wrong way to go."
Donald Trump: "Huh?"
Re:Donald Trump would say.... (Score:4, Funny)
I wanted to mod you up, but there wasn't a "Sad" option.
Re: (Score:2)
Just scramble them until they're sorted on their own.
Re: (Score:2)
Bogosort! [wikipedia.org]
Re: Donald Trump would say.... (Score:2)
Re: (Score:2)
Now if only presidency was about computer science.
Nerds would be the kings of presidents.
Re: (Score:2)
Delete the objects that are out of order.
Re: (Score:3)
Delete the objects that are out of order.
You mean like randomly firing people until the job gets done right?
Make Sorting Great Again (Score:2)
The Donald will make sorting great again.
MSGA!
It will be the best algorithm. Better than the "Chinese" algorithm and it will replace the fake Democrat algorithms.
Re: (Score:3)
When asked for the most efficient way to sort a million 32-bit integers in 2008, then-presidential candidate Barack Obama answered, "I think the bubble sort would be the wrong way to go."
Donald Trump: "Huh?"
And both answers are equally useful.
Re:Donald Trump would say.... (Score:4, Insightful)
Yes but one demonstrates the ability to know what the hell a sorting algorithm even is.
Re: (Score:2)
Also "don't use bubble sort" is already 80% of the way to a decent solution.
Re: (Score:2)
Re: (Score:2)
Do yoy really believe that he has any grasp of sorting algorithms? SMH.
Unless he took an algorithms course in college, he probably doesn't. But if you sat him down and explained the basics to him, I'm pretty sure he'd be able to grasp the ideas without too much difficulty.
Re: (Score:2)
Actually what it demonstrates is the ability to recall something he heard somewhere sometime
Congratulations, you described knowledge.
Re: (Score:3)
No presidential candidate would know how to answer this question unless, by some unlikely chance, they happened to have studied computer science in university.
Obama was only able to answer the question because Google staff had prepped him for it ahead of time [quora.com].
Re: (Score:2)
No presidential candidate would know how to answer this question unless, by some unlikely chance, they happened to have studied computer science in university.
Obama was only able to answer the question because Google staff had prepped him for it ahead of time [quora.com].
Good citation. Trump would still make a word salad out of the answer, though. If he remembered it.
Re: (Score:3)
Re: (Score:2)
When asked for the most efficient way to sort a million 32-bit integers in 2008, then-presidential candidate Barack Obama answered, "I think the bubble sort would be the wrong way to go."
Donald Trump: "Huh?"
Nah! Trump would say:
"Nobody knows computer sorting better than me!"
Re: (Score:2)
Remember Obama, 2008?
Now you find a similar quote from DT.
Re: (Score:2)
https://www.washingtonpost.com... [washingtonpost.com]
"I know more about renewables than any human being on Earth."
"I understand the power of Facebook maybe better than almost anybody, based on my results, right?"
"Nobody knows more about debt. I'm like the king. I love debt."
"I think nobody knows more about taxes than I do, maybe in the history of the world. Nobody knows more about taxes."
etc...
And that's just one of the many articles I got on a simple Google search for "trump quote better than anyone". Plenty of videos, too.
Re: (Score:2)
"Nobody knows more about debt. I'm like the king. I love debt."
That's fine, he can take that particular title.
Yes (Score:2)
In theory, you can't get any faster than Radix sort.
Although a naive implementation can have a large constant factor.
The complexity of making an implementation that makes the constant factor manageable for anything but input data that covers a small range typically outweighs performance benefits.
Re: (Score:2)
In theory, you can't get any faster than Radix sort.
That has got to be tied (with many others) for the stupidest "theory" possible.
If you're wondering why, you could restrict yourself entirely to pre-Socratic writers in Ancient Greece and you'd still find it was already analyzed to death.
The main problem with quicksort... (Score:3)
And, as a worst case, the most straightforward way to implement it can decay performance to O(n^2). Modifying it to ensure that it is n(log n) as a worst case is just as complex as implementing another guaranteed O(n log n) sort such as heapsort.
Another possibility to consider might be Smoothsort [wikipedia.org], which adaptively varies between O(n) and O(n log n), depending on how sorted the input data already is.
Smoothsort is complex, however... but depending on the kind of data you have to sort, it may be suitable.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
This a highly loaded question. =) (Score:5, Interesting)
Yes.. Bubble Sort. Okay, now that your done laughing let me explain. All of the sort discussions assume that your on a single instruction at a time machine. Comparisons can only be done 1 at a time. As such you start to get into typical sorting times of nlog(n). However, when you get to a computing framework that allows massive parallel evaluations it changes the framework completely. In a distributed system, where each node has its own CPU you can actually actually compare neighbors in parallel and then transfer.
Assuming a 6 node cluster starting with the following data at t0 [6, 2, 4, 3, 1, 5] we can sort it in O(n) steps:
t1: swap 6 to 2, swap 4 to 3, leave 1 and 5. results: [2, 6, 3, 4, 1, 5]
t2: swap 6 to 3, swap 4 to 1, results: [2, 3, 6, 1, 4, 5]
t3: leave 2 and 3, swap 6 to 1, leave 4 and 5, results: [2, 3, 1, 6, 4, 5]
t4: swap 3 to 1, swap 6 to 4, results: [2, 1, 3, 4, 6, 5]
t5: swap 1 to 2, leave 3 and 4, swap 5 to 6, results: [1, 2, 3, 4, 5, 6]
Worst case is that an object has to "bubble" once across the system, which is O(n) =)
Re: (Score:2)
I was wondering how it handled a reversed list, as it would swap 4 with both 5 and 3. Then I realized you alternate 2 processes: bubble odd pairs then bubble even.
Re: (Score:2)
The Sorting Algorithm Olympics (Score:2)
Here is a fun video on sorting speed https://www.youtube.com/watch?... [youtube.com]
simd (Score:2)
They should focus their efforts on some kind of SIMD super instruction that will handle multiple integers in one go and output some kind of metadata that would be helpful in the sort process.
I'd love to see what might be achieved by kind of software-definable SIMD processor like an FPGA but that can fully take over from the CPU and access main memory to do some task then hand the bus back to the CPU.
Re: (Score:2)
Here is an idea. Start with the highest order bit of every number, working down to the lowest bit, and have it calculate a vector order array of the relative displacement of each value from its sorted position, and then the final matrix operation would be to move each number simultaneously using that calculated offset into their new sorted position. Aka, a SMID differential sort algorithm.
You would need a processor at least three times (potentially much more) the size of the dataset, and some fancy kind of
Depends on the architecture (Score:3)
You would find that an altered quicksort and merge sort can be much faster. Consider breaking a numerical array into symmetric chunks and quicksorting. While quicksorting each chunk on a separate core, maintain a mean value as an atomic value shared across threads. Then also store the position of the mean between each core. When all cores complete, merge the values above and below the mean so the top half of the sorted chunks are all contiguous and then below... then run a quicksort where chunks are symmetrical above and below the mean. Repeat until sorted. This is just an oversimplified distributes algorithm.
Now consider the same problem but account for cache sizes to eliminate cache misses. This can be done with ensuring chunks are all aligned and fit within L1 cache.
If you are trying to be fast; you need to consider more than just algorithmic weaknesses. Also consider architectural issues. Also consider that if memory is plentiful enough, real magic could happen by not working in place.
Then there is the issue that integers are not always the issue. Typically, you sort records, not numbers. As such, you can find indexing and then in place sorting would be faster.
So, while most real education occurs from Knuth, and it should, you need to also consider distributed algorithms which Knuth said he would leave to others.
Linq (Score:3)
Re: (Score:2)
Heapsort is the answer - fastest worst case (Score:2)
Re: (Score:2)
Agreed. Heapsort may not be the fastest in all cases but it is much faster than quicksort in the worst case. I use a heapsort algorithm whenever I am faced with a lot of random items to sort.
Bubble was my first ... (Score:3)
... sorted affair.
radix sort? (Score:2)
Re: (Score:2)
Dumpstersort (Score:3)
That's where my wife picks through the closets and stacks all my stuff that she doesn't think I need out by the garbage can. And I return it to a secure location as fast as possible.
Re: (Score:2)
I use Paged Virtual Memory on my wife that way when she opens the cupboard she only sees her stuff and can't touch mine. The downside here is that when taking cloths off the line she always leaves behind the towels and bed sheets as they aren't loaded into her page table.
cool thread (Score:2)
An olde school /. thread worth reading through! Very impressed by the minimal off topic flame warring even with a trigger proper noun in the summary.
efficient on ordered data (Score:2)
I have, in fact, a very elegant O(0) sorting algorithm for ordered data!
Quicksort sucks. Only the incompetent use it. (Score:2)
For a sort with assured time and space need, use bottom-up heapsort or mergesort. Anybody using Quicksort should be banned from ever writing commercial code again.
One bad thing about quicksort... (Score:2)
The original quicksort is not very useful, as it chokes badly when there's a large number of identical values in your list. When people talk about quicksort, they usually mean one of its variations, such as the dual-pivot quicksort. Most sorts will slow down with certain datasets, but being recursive, quicksort can actually crash without some sanity limits in place.
A long time ago I wrote a program called Picsort on my Amiga, which sorted a bitmap in video memory, so you could see the program working. St
Putting data into array is the fastest algorithm. (Score:2)
Sorting by comparison is the wrong way to sort data.
In the example of one million numbers, simply go through them and place them into the appropriate position in a million entry array.
For other data, for example text, simply use the first 8/16/32-bit fragment to of the text to find the appropriate array entry, then repeat for next fragment, using a 2nd level array etc.
Any electronic data can be reduced to an index, and thus that index can be used to put the data into an array.
Re: (Score:2)