Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
AI Programming

AI System Devises First Optimizations To Sorting Code In Over a Decade (arstechnica.com) 67

An anonymous reader quotes a report from Ars Technica: Anyone who has taken a basic computer science class has undoubtedly spent time devising a sorting algorithm -- code that will take an unordered list of items and put them in ascending or descending order. It's an interesting challenge because there are so many ways of doing it and because people have spent a lot of time figuring out how to do this sorting as efficiently as possible. Sorting is so basic that algorithms are built into most standard libraries for programming languages. And, in the case of the C++ library used with the LLVM compiler, the code hasn't been touched in over a decade.

But Google's DeepMind AI group has now developed a reinforcement learning tool that can develop extremely optimized algorithms without first being trained on human code examples. The trick was to set it up to treat programming as a game. [...] The AlphaDev system developed x86 assembly algorithms that treated the latency of the code as a score and tried to minimize that score while ensuring that the code ran to completion without errors. Through reinforcement learning, AlphaDev gradually develops the ability to write tight, highly efficient code. [...]

Since AlphaDev did produce more efficient code, the team wanted to get these incorporated back into the LLVM standard C++ library. The problem here is that the code was in assembly rather than C++. So, they had to work backward and figure out the C++ code that would produce the same assembly. Once that was done, the code was incorporated into the LLVM toolchain -- the first time some of the code had been modified in over a decade. As a result, the researchers estimate that AlphaDev's code is now executed trillions of times a day.
The research has been published in the journal Nature.
This discussion has been archived. No new comments can be posted.

AI System Devises First Optimizations To Sorting Code In Over a Decade

Comments Filter:
  • Summary... (Score:5, Funny)

    by Joce640k ( 829181 ) on Thursday June 08, 2023 @08:29AM (#63585608) Homepage

    ...built from a large collection of highly specific functions that handle just one or a few situations. For example, there are separate algorithms for sorting three items, four items, and five items. ...

    AlphaDev was able to shave an instruction off of sort-3, sort-5, and sort-8, and even more off of sort-6 and sort-7. There was only one (sort-4) where it didn't find a way to improve the human code.

    TLDR: If you're sorting three, five or eight items it will run a whole clock cycle faster. Six and seven items will be "more than one cycle".

    Revolutionary!

    • Re:Summary... (Score:5, Insightful)

      by saloomy ( 2817221 ) on Thursday June 08, 2023 @08:40AM (#63585626)
      I can't tell if you are being sarcastic or not, but these improvements can have drastic effects in places you would not expect, like high throughput network devices or integrated systems with limited compute. Sort is a very common operation for software to perform.
      • Re: (Score:2, Insightful)

        by Joce640k ( 829181 )

        these improvements can have drastic effects in places you would not expect, like high throughput network devices or integrated systems with limited compute. Sort is a very common operation for software to perform.

        Right, but this tiny optimization on one very specific set of code isn't going to achieve any of that.

        • by Anonymous Coward

          Right, but this tiny optimization on one very specific set of code isn't going to achieve any of that.

          Clearly you are not aware of the reason only sort-3 to sort-8 exist.
          It is not because computers can't sort lists with more than 8 items.
          It is because any list with a size a multiple of 3 but not 6 will use sort-3, and if it is a multiple of 6 it will use sort-6

          Do you know how many whole numbers exist between 3 and 2^64 that are multiples of 3 and 6?
          This is not some edge case, this is around 30% of all sorting operations a computer will do.
          This is also not "just" one instruction saved, it is the combined ins

          • Re:Summary... (Score:5, Insightful)

            by north_by_midwest ( 7997468 ) on Thursday June 08, 2023 @11:58AM (#63586066)

            The part you are alluding to but glossing over a bit is that divide-and-conquer sorting algorithms (e.g. quicksort) function by breaking up a large sorting problem into many smaller sorting problems, eventually arriving at a point where they must sort many small lists. Because of this, optimizing these small sort operations will improve sort performance for inputs of any size. Also, lists don't need to be an integer multiple of the sort-n in size for a faster sort-n to get benefit them; the sizes of the sub-problems in a typical sorting algorithm are generally random, and at the end stages of the algorithm, you'll end up with lists of varying single digit sizes.

        • by suutar ( 1860506 )

          Depends how many times it gets used and how many cycles overall it takes. 1 cycle out of 20 is 5%, after all, and if you're doing it a million times a second that's kind of substantial.

      • Re:Summary... (Score:4, Insightful)

        by michelcolman ( 1208008 ) on Thursday June 08, 2023 @09:03AM (#63585670)

        They shouldn't call it "new algorithms" though. It's micro-optimization which is definitely a good thing but doesn't actually change how the algorithm works (if I understood correctly).

        I do believe this has a lot of potential: high level languages are designed with human programmers in mind, preventing them from making mistakes but also adding way too much code in the process. Manually optimizing that bloated code for a whole app, one instruction at a time, is basically impossible for humans (without making mistakes) but would be easy for AI. It can turn nicely structured but overly complicated functions back into good old spaghetti code that runs faster while producing the same results.

        • If there were a way to keep AI in some known state, so the optimized code, no matter how mutilated by the AI, always works the same as the original code, this would be useful, but I worry about AI finding some shortcut, for example, something like working with sound files, where the AI knows that people don't hear past a certain range or bitrate, so doing a calculation with 96 bit sound files by turning them into 16 bit files, doing the work, then expanding them back to 96 bits is faster than working with t

          • While I get the general point, that specific example would require the 'AI' to actually be intelligent and know what it could get away with in the short term - because training based on scoring "reduced cycles for the same output" could not result in the outcome in your example.

            • by HiThere ( 15173 )

              That's overstating the bounds. Evolutionary behavior can reach unexpected optimizations without any understanding appearing anywhere in the process.

              This isn't (or doesn't appear to be) a case where that would apply, but your more general statement needs to be carefully refined.

              • The issue is the 'same output'. No algorithm under that constraint will select an optimisation that changes the output. It'd be discarded as a failure. To drop bits for faster audio processing would result in changed output - and the meta knowledge and understanding that a human probably wouldn't notice. The algorithm isn't intelligent and can't cheat like that.

                • by HiThere ( 15173 )

                  That's true, but unexpected characteristics of the training data set can mean it's not optimizing for what you think it's optimizing for.

          • That bridge is long since passed. In theory, all your local variables are created on the stack, parameters passed similarly, symbols existing the entire length of their code block.

            But if you look under the optimized hood, variables are often only registers, never assigned RAM on the stack, and the register re-used for something else while still in the block where the variable theoretically still exists.

            This can make debugging a pain, especially on embedded systems, as there is no way to check the variable'

            • I've looked at optimized code, and while some optimizations indeed occur like you pointed out, there's still an awful lot of low hanging fruit to be optimized. I once stepped through a fused-multiply-add library call, which should have just been one instruction, but instead it was a function calling a function calling a function looking up the address of a function and then calling that, each of them saving a few registers on the stack. It was like 50 instructions instead of one. (On MacOS)

              And I've seen ple

      • This was some very, very low-hanging fruit then. If the use case of sorting some small set of items (say 3 items) is used that widely, then some human should have handled those explicit cases in the sorting function. It's not that a machine could write a more efficient algorithm for sorting 3 items than a human, but whether or not it should have been done in the first place.

        There are many places for high-level optimization of this kind, and it's purely a matter of use cases. Here's a perfect example. Back w

        • by HiThere ( 15173 )

          Yes, but one should expect that the low hanging fruit would be the first picked. That doesn't imply it will be the last that is picked.

          OTOH, it's too bad they couldn't directly optimize for C. That's a lot more portable than x86 assembler. (But it would be harder to optimize, as the cost of execution is more opaque.)

        • Re: Summary... (Score:4, Interesting)

          by Jeremi ( 14640 ) on Thursday June 08, 2023 @12:01PM (#63586078) Homepage

          some human should have handled those explicit cases in the sorting function.

          Sure, but humans typically only find optimization opportunities if they already believe they exist; once everyone has assumed the code is âoeoptimalâ, nobody looks at it or wants to modify it anymore, since the risk of introducing a regression outweighs the expected speedup to be gained.

          Where AI could be useful is in bulk-scanning âoeoptimalâ code to point out opportunities that nobody else could see. Since code efficiency in practice is often counterintuitive, I suspect there are a lot of hidden optimizations waiting to be found.

          • If I had some mod points, I'd mod this "Interesting!"

          • by shess ( 31691 )

            some human should have handled those explicit cases in the sorting function.

            Sure, but humans typically only find optimization opportunities if they already believe they exist; once everyone has assumed the code is âoeoptimalâ, nobody looks at it or wants to modify it anymore, since the risk of introducing a regression outweighs the expected speedup to be gained.

            Where AI could be useful is in bulk-scanning âoeoptimalâ code to point out opportunities that nobody else could see. Since code efficiency in practice is often counterintuitive, I suspect there are a lot of hidden optimizations waiting to be found.

            I mean, true, but I rather suspect that correctly setting up an evaluation system to allow an AI to train possible sort algorithms probably required as much or more effort than simply looking at the problem. Someone had to believe there was an improvement in there in order to even start the project, and my guess is that they also had a good idea what the improvement might look like.

        • Re:Summary... (Score:5, Insightful)

          by smoot123 ( 1027084 ) on Thursday June 08, 2023 @12:22PM (#63586144)

          This was some very, very low-hanging fruit then. If the use case of sorting some small set of items (say 3 items) is used that widely, then some human should have handled those explicit cases in the sorting function. It's not that a machine could write a more efficient algorithm for sorting 3 items than a human, but whether or not it should have been done in the first place.

          Reading the article and chasing some links, that's in fact the situation. I didn't realize it but the standard library has a function specifically for sorting three item lists. Apparently that's an important enough operation to be worth special casing. I would have imagined enough people sweated over that code that it would be tough to make better. And yet that's what happened.

          • Re:Summary... (Score:4, Insightful)

            by Ksevio ( 865461 ) on Thursday June 08, 2023 @01:26PM (#63586318) Homepage

            Common sorting algorithms like merge-sort split the data into a smaller sets until they reach a set with 5 or fewer items so sorting 1-5 items quickly improves the speed of the whole algorithm. Sorting 5 items is considered an O(1) operation, so the complexity isn't affected, but the runtime speed would

      • If you are sorting anything but small sets of data, or even sorting these often, you are almost certainly doing something very wrong, that a slightly faster sort is not going to help ...
        Sometimes the best way of optimising is doing it a better way

        • by Tailhook ( 98486 )

          Sorting small sets is a high frequency operation when sorting large sets. Locality of reference (which usually boils down to efficient CPU cache usage) require such tactics, as opposed to text book algorithms.

    • It's long been known that, below a certain number of items to sort, it's faster to use a slower algorithm with less overhead than getting Superman to lift that mouse.

      It doesn't surprise me they have little optimized sorts for different low numbers to apply at various key moments.

  • All I have to say.

  • Genetic Algorithms [wikipedia.org] have been around since the 1950s.

  • You'd think that they'd have used a higher level language like C.

    • Re: (Score:3, Informative)

      The point is that they are optimizing the output from a C compiler. Contrary to the article, they are not developing new "algorithms" but optimizing low level code.

      • There was no guarantee they could even find C code that would compile into the optimized assembly they want.

        It's also kind of a design principle to avoid using the register keyword since the compiler will probably do a better job than you, on the rare case there aren't enough registers to hold all your local vars anyway.

      • Thaaaaaank you. I was here wondering "why are we not talking about what those algorithms are and how they function / compare to well-known sort algos".. that makes so much more sense.
    • C relies on the compiler and what it may (or may not) optimize, but that may or may not (mostly not) lead to the same optimized assembly code.

      It has always been true that to get the best optimized code you'd build by hand in assembly. The problem is that this is extremely resource intensive (time, money, developers), so it is impractical to write all code in assembly. Higher level languages take advantage of the fact that few (if any) parts of a program really require highly optimized assembly to get their

      • by Nutria ( 679911 )

        That sounds a lot like zero-code and no-code: end users specify what they want, and the computer magically makes it happen.

        • Not really. Zero code and no code are magic and a longer time off than what anyone expects.

          As developers, we will code up requirements to come to working solutions, but we don't necessarily generate the best code in doing so. Being able to read what we've created and transform into better implementations would still rely on our work, it would just lead to a better outcome.

  • I remember my pascal class I had to make change from various coin denominations. $0.01, $0.05, $0.25, $0.50. That was my introduction to sorting.

  • by LordHighExecutioner ( 4245243 ) on Thursday June 08, 2023 @09:20AM (#63585690)
    ...FFTW [fftw.org] used a machine-driven algorithm to automatically split FFT into mixed radix for maximum speedup. It was released around 2000, no need of AI, just clever coding!
  • by 140Mandak262Jamuna ( 970587 ) on Thursday June 08, 2023 @10:05AM (#63585782) Journal
    It is true, most coders are obsessed with "generality", "modularity" etc. And thus hate hard wiring code or special casing code. That leaves some low end crumbs on the table. But even die hard coders, when they know they are dealing with 3 umbers or four, they special case the functions they do not write a loop for Min3() or call qsort for Sort3().

    I would rather have easy to debug code than these inscrutable code that shaves a few clock cycles.

    Speculative execution really looked like a clever and nifty idea. Back then. Who'd've thunk Spectre flaw and data leak from it? Nifty ideas have a way of coming back to bite you in the tail.

    • by HiThere ( 15173 )

      OTOH, if it's a debugged library, then I don't worry too much about the internals of the library...except when I need to. E.g. I still can't figure out whether when I return a string that came in as a parameter I end up copying the string. Fortunately, if it's a const, I can generally use string_view instead...but I'd prefer to not need to bother with the conversion....and I can't tell. Perhaps it's implementation dependent.

      • Even in well debugged, well tested, rugged libraries, we need code clarity.

        The Spectre flaw [wikipedia.org]is actually a good example. The idea is so well tested, and the implementation can be proven mathematically correct it got all the way down to chip instruction sets. Not software, but actual code burnt into the chips. There is nothing wrong with the code's designed functions. But the speculative execution executes and accesses an unauthorized memory location before the if statement says, "no you can't access that

  • Given the behavior we've seen, it's a valid question.
  • This is absolutely nothing new. While big O of algorithms matters in the limit, for not gigantic problems there are lots of possible optimizations that don't change big O, or even make it worse, but do reduce running time drastically. If you know something about the input, you can optimize even more.

    • by suutar ( 1860506 )

      The new part is that someone has figured out how to tell a computer "I think that's optimized but go see if I missed anything".

  • by UnknownSoldier ( 67820 ) on Thursday June 08, 2023 @12:16PM (#63586122)

    I pointed this [slashdot.org] out a few years back:

    Andrei Alexandrescu in his 2019 talk Sorting Algorithms: Speed Is Found In The Minds of People [youtube.com] invented a new sorting algorithm and discovered everyone has been measuring sorting algorithms with an incomplete formula.

    The standard cost is:

    O(n) = C(n) + S(n)

    Where:

    C(n) = time of Compares
    S(n) = time of Swaps

    The complete formula is:

    Blended Cost = (C(n) + M(n) + k*D(n))/n

    C(n) = time of Compares
    M(N) = time of Moves
    D(n) = average distance between subsequent array accesses

    I'm not surprised we are still discovering new ways to sort.

  • by dixonpete ( 1267776 ) on Thursday June 08, 2023 @12:52PM (#63586230)
    Code needs to be maintained. If that C++ code was hacked together just to match the assembler code created by the AI there may not be any logic that human programmers can follow to allow them to modify the code going forward. That could mean forever dependence on the AI for that section of the code. Not a good thing.
  • "There was only one (sort-4) where it didn't find a way to improve the human code. "
    So, what happened with sort-2? And, the most important of all, Sort-1?
  • Seems like they've basically rediscovered Danny Hillis's work from the late '80s, who used genetic algorithms in a co-evolutionary setting to optimise sorting algorithms. Can't quite put my finger on a reference, but maybe published in ALife II?
  • Pretty much any function call in libc could be further optimized. The fact that no changes were made to those sort algorithms means that nobody looked into those. I'm pretty sure a decent programmer could have achieved the same or better result with less effort.

  • In VB6, it was seriously suggested that the best way to sort anything (with 65535 items or less) was to stuff it into a hidden text box, set its Sorted property to True, then read it back out.

    I think Bubblesort was an improvement.

  • Where's the PR?

The unfacts, did we have them, are too imprecisely few to warrant our certitude.

Working...