AI System Devises First Optimizations To Sorting Code In Over a Decade (arstechnica.com) 67
An anonymous reader quotes a report from Ars Technica: Anyone who has taken a basic computer science class has undoubtedly spent time devising a sorting algorithm -- code that will take an unordered list of items and put them in ascending or descending order. It's an interesting challenge because there are so many ways of doing it and because people have spent a lot of time figuring out how to do this sorting as efficiently as possible. Sorting is so basic that algorithms are built into most standard libraries for programming languages. And, in the case of the C++ library used with the LLVM compiler, the code hasn't been touched in over a decade.
But Google's DeepMind AI group has now developed a reinforcement learning tool that can develop extremely optimized algorithms without first being trained on human code examples. The trick was to set it up to treat programming as a game. [...] The AlphaDev system developed x86 assembly algorithms that treated the latency of the code as a score and tried to minimize that score while ensuring that the code ran to completion without errors. Through reinforcement learning, AlphaDev gradually develops the ability to write tight, highly efficient code. [...]
Since AlphaDev did produce more efficient code, the team wanted to get these incorporated back into the LLVM standard C++ library. The problem here is that the code was in assembly rather than C++. So, they had to work backward and figure out the C++ code that would produce the same assembly. Once that was done, the code was incorporated into the LLVM toolchain -- the first time some of the code had been modified in over a decade. As a result, the researchers estimate that AlphaDev's code is now executed trillions of times a day. The research has been published in the journal Nature.
But Google's DeepMind AI group has now developed a reinforcement learning tool that can develop extremely optimized algorithms without first being trained on human code examples. The trick was to set it up to treat programming as a game. [...] The AlphaDev system developed x86 assembly algorithms that treated the latency of the code as a score and tried to minimize that score while ensuring that the code ran to completion without errors. Through reinforcement learning, AlphaDev gradually develops the ability to write tight, highly efficient code. [...]
Since AlphaDev did produce more efficient code, the team wanted to get these incorporated back into the LLVM standard C++ library. The problem here is that the code was in assembly rather than C++. So, they had to work backward and figure out the C++ code that would produce the same assembly. Once that was done, the code was incorporated into the LLVM toolchain -- the first time some of the code had been modified in over a decade. As a result, the researchers estimate that AlphaDev's code is now executed trillions of times a day. The research has been published in the journal Nature.
Summary... (Score:5, Funny)
...built from a large collection of highly specific functions that handle just one or a few situations. For example, there are separate algorithms for sorting three items, four items, and five items. ...
AlphaDev was able to shave an instruction off of sort-3, sort-5, and sort-8, and even more off of sort-6 and sort-7. There was only one (sort-4) where it didn't find a way to improve the human code.
TLDR: If you're sorting three, five or eight items it will run a whole clock cycle faster. Six and seven items will be "more than one cycle".
Revolutionary!
Re:Summary... (Score:5, Insightful)
Re: (Score:2, Insightful)
these improvements can have drastic effects in places you would not expect, like high throughput network devices or integrated systems with limited compute. Sort is a very common operation for software to perform.
Right, but this tiny optimization on one very specific set of code isn't going to achieve any of that.
Re: (Score:1)
Right, but this tiny optimization on one very specific set of code isn't going to achieve any of that.
Clearly you are not aware of the reason only sort-3 to sort-8 exist.
It is not because computers can't sort lists with more than 8 items.
It is because any list with a size a multiple of 3 but not 6 will use sort-3, and if it is a multiple of 6 it will use sort-6
Do you know how many whole numbers exist between 3 and 2^64 that are multiples of 3 and 6?
This is not some edge case, this is around 30% of all sorting operations a computer will do.
This is also not "just" one instruction saved, it is the combined ins
Re:Summary... (Score:5, Insightful)
The part you are alluding to but glossing over a bit is that divide-and-conquer sorting algorithms (e.g. quicksort) function by breaking up a large sorting problem into many smaller sorting problems, eventually arriving at a point where they must sort many small lists. Because of this, optimizing these small sort operations will improve sort performance for inputs of any size. Also, lists don't need to be an integer multiple of the sort-n in size for a faster sort-n to get benefit them; the sizes of the sub-problems in a typical sorting algorithm are generally random, and at the end stages of the algorithm, you'll end up with lists of varying single digit sizes.
Re: (Score:2)
Depends how many times it gets used and how many cycles overall it takes. 1 cycle out of 20 is 5%, after all, and if you're doing it a million times a second that's kind of substantial.
Re:Summary... (Score:4, Insightful)
They shouldn't call it "new algorithms" though. It's micro-optimization which is definitely a good thing but doesn't actually change how the algorithm works (if I understood correctly).
I do believe this has a lot of potential: high level languages are designed with human programmers in mind, preventing them from making mistakes but also adding way too much code in the process. Manually optimizing that bloated code for a whole app, one instruction at a time, is basically impossible for humans (without making mistakes) but would be easy for AI. It can turn nicely structured but overly complicated functions back into good old spaghetti code that runs faster while producing the same results.
Re: (Score:2)
If there were a way to keep AI in some known state, so the optimized code, no matter how mutilated by the AI, always works the same as the original code, this would be useful, but I worry about AI finding some shortcut, for example, something like working with sound files, where the AI knows that people don't hear past a certain range or bitrate, so doing a calculation with 96 bit sound files by turning them into 16 bit files, doing the work, then expanding them back to 96 bits is faster than working with t
Re: (Score:2)
While I get the general point, that specific example would require the 'AI' to actually be intelligent and know what it could get away with in the short term - because training based on scoring "reduced cycles for the same output" could not result in the outcome in your example.
Re: (Score:2)
That's overstating the bounds. Evolutionary behavior can reach unexpected optimizations without any understanding appearing anywhere in the process.
This isn't (or doesn't appear to be) a case where that would apply, but your more general statement needs to be carefully refined.
Re: (Score:2)
The issue is the 'same output'. No algorithm under that constraint will select an optimisation that changes the output. It'd be discarded as a failure. To drop bits for faster audio processing would result in changed output - and the meta knowledge and understanding that a human probably wouldn't notice. The algorithm isn't intelligent and can't cheat like that.
Re: (Score:3)
That's true, but unexpected characteristics of the training data set can mean it's not optimizing for what you think it's optimizing for.
Re: (Score:2)
That bridge is long since passed. In theory, all your local variables are created on the stack, parameters passed similarly, symbols existing the entire length of their code block.
But if you look under the optimized hood, variables are often only registers, never assigned RAM on the stack, and the register re-used for something else while still in the block where the variable theoretically still exists.
This can make debugging a pain, especially on embedded systems, as there is no way to check the variable'
Re: (Score:1)
I've looked at optimized code, and while some optimizations indeed occur like you pointed out, there's still an awful lot of low hanging fruit to be optimized. I once stepped through a fused-multiply-add library call, which should have just been one instruction, but instead it was a function calling a function calling a function looking up the address of a function and then calling that, each of them saving a few registers on the stack. It was like 50 instructions instead of one. (On MacOS)
And I've seen ple
Re: (Score:2)
This was some very, very low-hanging fruit then. If the use case of sorting some small set of items (say 3 items) is used that widely, then some human should have handled those explicit cases in the sorting function. It's not that a machine could write a more efficient algorithm for sorting 3 items than a human, but whether or not it should have been done in the first place.
There are many places for high-level optimization of this kind, and it's purely a matter of use cases. Here's a perfect example. Back w
Re: (Score:2)
Yes, but one should expect that the low hanging fruit would be the first picked. That doesn't imply it will be the last that is picked.
OTOH, it's too bad they couldn't directly optimize for C. That's a lot more portable than x86 assembler. (But it would be harder to optimize, as the cost of execution is more opaque.)
Re: Summary... (Score:4, Interesting)
some human should have handled those explicit cases in the sorting function.
Sure, but humans typically only find optimization opportunities if they already believe they exist; once everyone has assumed the code is âoeoptimalâ, nobody looks at it or wants to modify it anymore, since the risk of introducing a regression outweighs the expected speedup to be gained.
Where AI could be useful is in bulk-scanning âoeoptimalâ code to point out opportunities that nobody else could see. Since code efficiency in practice is often counterintuitive, I suspect there are a lot of hidden optimizations waiting to be found.
Re: (Score:1)
If I had some mod points, I'd mod this "Interesting!"
Re: (Score:2)
some human should have handled those explicit cases in the sorting function.
Sure, but humans typically only find optimization opportunities if they already believe they exist; once everyone has assumed the code is âoeoptimalâ, nobody looks at it or wants to modify it anymore, since the risk of introducing a regression outweighs the expected speedup to be gained.
Where AI could be useful is in bulk-scanning âoeoptimalâ code to point out opportunities that nobody else could see. Since code efficiency in practice is often counterintuitive, I suspect there are a lot of hidden optimizations waiting to be found.
I mean, true, but I rather suspect that correctly setting up an evaluation system to allow an AI to train possible sort algorithms probably required as much or more effort than simply looking at the problem. Someone had to believe there was an improvement in there in order to even start the project, and my guess is that they also had a good idea what the improvement might look like.
Re:Summary... (Score:5, Insightful)
This was some very, very low-hanging fruit then. If the use case of sorting some small set of items (say 3 items) is used that widely, then some human should have handled those explicit cases in the sorting function. It's not that a machine could write a more efficient algorithm for sorting 3 items than a human, but whether or not it should have been done in the first place.
Reading the article and chasing some links, that's in fact the situation. I didn't realize it but the standard library has a function specifically for sorting three item lists. Apparently that's an important enough operation to be worth special casing. I would have imagined enough people sweated over that code that it would be tough to make better. And yet that's what happened.
Re:Summary... (Score:4, Insightful)
Common sorting algorithms like merge-sort split the data into a smaller sets until they reach a set with 5 or fewer items so sorting 1-5 items quickly improves the speed of the whole algorithm. Sorting 5 items is considered an O(1) operation, so the complexity isn't affected, but the runtime speed would
Re: (Score:2)
If you are sorting anything but small sets of data, or even sorting these often, you are almost certainly doing something very wrong, that a slightly faster sort is not going to help ...
Sometimes the best way of optimising is doing it a better way
Re: (Score:3)
Sorting small sets is a high frequency operation when sorting large sets. Locality of reference (which usually boils down to efficient CPU cache usage) require such tactics, as opposed to text book algorithms.
Re: (Score:3)
Wake me up if it can improve the swap function and eliminate the third variable.
Sure: x, y = y, x
Re: (Score:2)
Re: (Score:3)
There are many ways to swap values without using a third variable depending on what data type is being used. Multiple methods use arithmetic, with bitwise operators probably being the fastest.
Re:Yawn... (Score:5, Interesting)
x ^= y ^= x ^= y;
That works for integral types and takes three instruction cycles. That said, the following code can sometimes be faster:
a = x;
b = y;
x = b;
y = a;
Sure, it's four instructions plus some extra stack space, but the first pair of instructions can be completed simultaneously, as can the second, which results in a two cycle swap. It can be faster, or it can be slower, depending on many things, such as the extra stack/registers/L1I overhead and the number of times it's called.
In short, universally optimal code doesn't really exist, so benchmark everything! Also, only optimize where you need to.
Re: (Score:2)
For x86 systems, there's even an instruction that can be used so you just call a single instruction:
xchg x, y
Of course the compiler will automatically use this if you write your code in a normal way with a temporary variable, but might not if you try to do something clever with bitwise operations or arithmetic.
Re: (Score:2)
It's long been known that, below a certain number of items to sort, it's faster to use a slower algorithm with less overhead than getting Superman to lift that mouse.
It doesn't surprise me they have little optimized sorts for different low numbers to apply at various key moments.
Very cool (Score:1)
All I have to say.
How does this differ from a Genetic Algorithm ? (Score:3, Interesting)
Genetic Algorithms [wikipedia.org] have been around since the 1950s.
Re:How does this differ from a Genetic Algorithm ? (Score:5, Funny)
It's an optimized genetic algorithm?
Re: (Score:2)
Same very high level idea that you would explain to a high schooler. Different chapters in your optimization textbook.
Re:How does this differ from a Genetic Algorithm ? (Score:4, Insightful)
Why would it be surprising that current AI techniques are building upon the last 70 years of research?
Interesting that they used assembly language. (Score:2)
You'd think that they'd have used a higher level language like C.
Re: (Score:3, Informative)
The point is that they are optimizing the output from a C compiler. Contrary to the article, they are not developing new "algorithms" but optimizing low level code.
Re: (Score:2)
There was no guarantee they could even find C code that would compile into the optimized assembly they want.
It's also kind of a design principle to avoid using the register keyword since the compiler will probably do a better job than you, on the rare case there aren't enough registers to hold all your local vars anyway.
Re: (Score:2)
Re: (Score:3)
C relies on the compiler and what it may (or may not) optimize, but that may or may not (mostly not) lead to the same optimized assembly code.
It has always been true that to get the best optimized code you'd build by hand in assembly. The problem is that this is extremely resource intensive (time, money, developers), so it is impractical to write all code in assembly. Higher level languages take advantage of the fact that few (if any) parts of a program really require highly optimized assembly to get their
Re: (Score:2)
That sounds a lot like zero-code and no-code: end users specify what they want, and the computer magically makes it happen.
Re: (Score:2)
Not really. Zero code and no code are magic and a longer time off than what anyone expects.
As developers, we will code up requirements to come to working solutions, but we don't necessarily generate the best code in doing so. Being able to read what we've created and transform into better implementations would still rely on our work, it would just lead to a better outcome.
Re: Interesting that they used assembly language. (Score:2)
Often, someone will improve someone elseâ(TM)s code only to later find it broke it, and in a way that would have needed more context than the code itself provides. Is the code alone enough?
Re: (Score:2)
It should be, but of course that assumes that the developer correctly and completely implemented all requirements in their code base.
making change (Score:2)
I remember my pascal class I had to make change from various coin denominations. $0.01, $0.05, $0.25, $0.50. That was my introduction to sorting.
But... (Score:3)
Re: (Score:2)
That was, of course, over a decade ago.
Min3(), Max3(), Sort3() ... (Score:3)
I would rather have easy to debug code than these inscrutable code that shaves a few clock cycles.
Speculative execution really looked like a clever and nifty idea. Back then. Who'd've thunk Spectre flaw and data leak from it? Nifty ideas have a way of coming back to bite you in the tail.
Re: (Score:2)
OTOH, if it's a debugged library, then I don't worry too much about the internals of the library...except when I need to. E.g. I still can't figure out whether when I return a string that came in as a parameter I end up copying the string. Fortunately, if it's a const, I can generally use string_view instead...but I'd prefer to not need to bother with the conversion....and I can't tell. Perhaps it's implementation dependent.
Re: (Score:2)
The Spectre flaw [wikipedia.org]is actually a good example. The idea is so well tested, and the implementation can be proven mathematically correct it got all the way down to chip instruction sets. Not software, but actual code burnt into the chips. There is nothing wrong with the code's designed functions. But the speculative execution executes and accesses an unauthorized memory location before the if statement says, "no you can't access that
Is it optimizing, or just hoodwinking the users? (Score:2)
Re: Is it optimizing, or just hoodwinking the user (Score:2)
Itâ(TM)s an easy question to answer; either the sorts run faster or they donâ(TM)t.
And if they donâ(TM)t, I think people would have noticed.
Re: (Score:2)
nothing new (Score:2)
This is absolutely nothing new. While big O of algorithms matters in the limit, for not gigantic problems there are lots of possible optimizations that don't change big O, or even make it worse, but do reduce running time drastically. If you know something about the input, you can optimize even more.
Re: (Score:2)
The new part is that someone has figured out how to tell a computer "I think that's optimized but go see if I missed anything".
New sorting algorithms still being discovered (Score:3)
I pointed this [slashdot.org] out a few years back:
Andrei Alexandrescu in his 2019 talk Sorting Algorithms: Speed Is Found In The Minds of People [youtube.com] invented a new sorting algorithm and discovered everyone has been measuring sorting algorithms with an incomplete formula.
The standard cost is:
O(n) = C(n) + S(n)
Where:
C(n) = time of Compares
S(n) = time of Swaps
The complete formula is:
Blended Cost = (C(n) + M(n) + k*D(n))/n
C(n) = time of Compares
M(N) = time of Moves
D(n) = average distance between subsequent array accesses
I'm not surprised we are still discovering new ways to sort.
How about the consequences to maintainability? (Score:3)
Re: (Score:3)
Re: (Score:2)
Don't worry, the AI that invented it, will maintain it too!
Only sort-4? (Score:2)
So, what happened with sort-2? And, the most important of all, Sort-1?
Rediscovered Danny Hillis's work? (Score:2)
What is the big surprize there? (Score:1)
Pretty much any function call in libc could be further optimized. The fact that no changes were made to those sort algorithms means that nobody looked into those. I'm pretty sure a decent programmer could have achieved the same or better result with less effort.
Back in the day (Score:2)
In VB6, it was seriously suggested that the best way to sort anything (with 65535 items or less) was to stuff it into a hidden text box, set its Sorted property to True, then read it back out.
I think Bubblesort was an improvement.
PR? (Score:1)