AI Learns To Write Computer Code In 'Stunning' Advance (science.org) 153
DeepMind's new artificial intelligence system called AlphaCode was able to "achieve approximately human-level performance" in a programming competition. The findings have been published in the journal Science. Slashdot reader sciencehabit shares a report from Science Magazine:
AlphaCode's creators focused on solving those difficult problems. Like the Codex researchers, they started by feeding a large language model many gigabytes of code from GitHub, just to familiarize it with coding syntax and conventions. Then, they trained it to translate problem descriptions into code, using thousands of problems collected from programming competitions. For example, a problem might ask for a program to determine the number of binary strings (sequences of zeroes and ones) of length n that don't have any consecutive zeroes. When presented with a fresh problem, AlphaCode generates candidate code solutions (in Python or C++) and filters out the bad ones. But whereas researchers had previously used models like Codex to generate tens or hundreds of candidates, DeepMind had AlphaCode generate up to more than 1 million.
To filter them, AlphaCode first keeps only the 1% of programs that pass test cases that accompany problems. To further narrow the field, it clusters the keepers based on the similarity of their outputs to made-up inputs. Then, it submits programs from each cluster, one by one, starting with the largest cluster, until it alights on a successful one or reaches 10 submissions (about the maximum that humans submit in the competitions). Submitting from different clusters allows it to test a wide range of programming tactics. That's the most innovative step in AlphaCode's process, says Kevin Ellis, a computer scientist at Cornell University who works AI coding.
After training, AlphaCode solved about 34% of assigned problems, DeepMind reports this week in Science. (On similar benchmarks, Codex achieved single-digit-percentage success.) To further test its prowess, DeepMind entered AlphaCode into online coding competitions. In contests with at least 5000 participants, the system outperformed 45.7% of programmers. The researchers also compared its programs with those in its training database and found it did not duplicate large sections of code or logic. It generated something new -- a creativity that surprised Ellis. The study notes the long-term risk of software that recursively improves itself. Some experts say such self-improvement could lead to a superintelligent AI that takes over the world. Although that scenario may seem remote, researchers still want the field of AI coding to institute guardrails, built-in checks and balances.
To filter them, AlphaCode first keeps only the 1% of programs that pass test cases that accompany problems. To further narrow the field, it clusters the keepers based on the similarity of their outputs to made-up inputs. Then, it submits programs from each cluster, one by one, starting with the largest cluster, until it alights on a successful one or reaches 10 submissions (about the maximum that humans submit in the competitions). Submitting from different clusters allows it to test a wide range of programming tactics. That's the most innovative step in AlphaCode's process, says Kevin Ellis, a computer scientist at Cornell University who works AI coding.
After training, AlphaCode solved about 34% of assigned problems, DeepMind reports this week in Science. (On similar benchmarks, Codex achieved single-digit-percentage success.) To further test its prowess, DeepMind entered AlphaCode into online coding competitions. In contests with at least 5000 participants, the system outperformed 45.7% of programmers. The researchers also compared its programs with those in its training database and found it did not duplicate large sections of code or logic. It generated something new -- a creativity that surprised Ellis. The study notes the long-term risk of software that recursively improves itself. Some experts say such self-improvement could lead to a superintelligent AI that takes over the world. Although that scenario may seem remote, researchers still want the field of AI coding to institute guardrails, built-in checks and balances.
The Future Is Now! (Score:3)
Where's the evidence they're NOT robots? (Score:5, Funny)
Most of the developers I know could easily be mistaken for robots... ;)
Re:Where's the evidence they're NOT robots? (Score:5, Funny)
Re: Where's the evidence they're NOT robots? (Score:2)
Re: (Score:2)
Any problem a human can solve, you can build a specialized system to solve faster
But does it take longer and cost more to create and test the system than it would have to solve the problem?
So far, one of "AI"'s most newsworthy successes has been to spoil the ancient game of chess for human players. It's like trying to race against a car.
Re: (Score:2)
Most of the developers I know could easily be mistaken for robots... ;)
Sad to say that was also my first thought when I saw the words "human-level performance".
Re: Where's the evidence they're NOT robots? (Score:2)
You can differentiate by the smell.
Re: (Score:2)
Cruel, funny but a little naive; if our robot overlords wish to mix with us and fool us into believing they are human, a little odour could be easily added ;)
Re: (Score:2)
Re: (Score:2)
Finally we can replace our developers with robots! Where do I sign up?
And what about all those "learn to code" classes?
Re:The Future Is Now! (Score:5, Insightful)
Hardly so, at least not for now. Here's the problem:
When presented with a fresh problem, AlphaCode generates candidate code solutions (in Python or C++) and filters out the bad ones. But whereas researchers had previously used models like Codex to generate tens or hundreds of candidates, DeepMind had AlphaCode generate up to more than 1 million.
I don't know what they're smoking. A problem in a programming competition has at most 2-3 good solutions. Everything else is either garbage, less efficient, way less efficient, or just plain wrong. I was in a couple of programming competitions in college done in 370 assembler. One of them was determining if a number was prime. There are only a couple of good solutions for that. I can't even imagine what a million different versions of such code would look like. Would they have padded junk like A=B and then B=A?
Re:The Future Is Now! (Score:4, Interesting)
Not a million different solutions. Just a million different snippets of code, where an unknown number of them (i assume) compiles and does something meaningful. And from that I guess they find one that actually works? (Given this input, expect this output)
If you put random lines of code from stackoverflow together in various permuations, then one of them maybe solves a problem?
I don't know.. is this just the monkeys writing shakespear-problem reiterated?
Re: (Score:2)
I don't know.. is this just the monkeys writing shakespear-problem reiterated?
I think it is. But very fast... 8-)
Re: (Score:2)
Thats the thing though. The secret sauce here is how the fitness function is defined, which on paper sounds like a hard thing to come up with for code.
Except these where competition coding, and there where unit tests designed to to check if the code actually passed the test.
And a unit test is a *perfect* fitness function for this sort of model training. Throw in perhaps a linter and some metric producing tests like cyclomatic complexity, to spice up the functiion and create a fitness function that goes beyo
Re: The Future Is Now! (Score:2, Interesting)
Exactly.
If the solution to the problem is to try every random sequence of code for suitability, that's not AI, that's just brute force and doomed to failure for novel problems
Re: (Score:2)
It's not doomed to failure on novel problems, but that approach *is* doomed to failure on complex novel problems. And perhaps just on complex problems.
That said, most programmers also can't handle those problems. They often need lots of specialist input. E.g., if we were to build this bridge in that place, how much traffic would it end up needing to carry? Is it worth it? The heart of that is a programming problem, but what the answer is depends on a lot of exogenous factors, that you can't answer with
Re: (Score:2)
If you say "every", then I'll say that the intelligence is in the evaluation function. I know that when I address a problem I'm not familiar with, I look at lots of ways to code it that I abandon. Some quickly, and some only after I've written a few routines, and seen where the problems are. (And sometimes I go back to one of those that I abandoned, and pick it up again, because it's the best choice I could think of.)
I wouldn't trust the reports of the details of how the program works. I'd guess that di
Re: (Score:3)
In order to make this work, the candidate programs have to be tested against some sort of success criteria. Honestly, if you know what success looks like, writing the code to get there is usually not that tricky. The usual complication in development isn't solving simple problems - it's about solving hard problems that don't have clear success criteria. You know you're at A and want to get to B, but don't know what terrain there is between them - and you only find that out by making the journey. Once you've
Re:The Future Is Now! (Score:4, Insightful)
The actual programming work takes a fraction of the time, and is more or less just collating what I'd have to feed into an AI anyway. At best, right now, the AI is a fancy code generator that we can't control in detail.
Coding competitions (Score:4, Insightful)
Re: (Score:3)
Sure the dog is talking, and it's sounds like a smooth, gentle west coast accent. But it's not like it's dispensing useful life advice.
Bad dog! No biscuit!
Re: (Score:2)
That's not true at all.
Most programming competition questions are about solving a known problem with a known solution -- potentially with some minor twists or recombinations thrown in -- quickly. You practice a bunch of those and you will know a bunch of the known problems and their known (decently efficient) solutions and how to recognize and apply them quickly to a problem at hand.
That's all about gaining knowledge, problem recognition, and application. It won't make you a perfect programmer or anything l
Re: (Score:2)
That's not the limitation in this kind of usage. The limitation is that the evaluation depends on knowing the correct answer in advance. That's not usually the circumstance in which one writes a computer program.
This is an important piece of an automatic programmer, but it's only a piece.
Re: (Score:2)
Finally we can replace our developers with robots! Where do I sign up?
Don't forget to fill in and sign the obligatory disclaimer where you, personally, assume legal responsibility for any and all harm caused directly or indirectly by programming errors.
There ain't no such thing as a free lunch.
Re:The Future Is Now! (Score:4, Insightful)
Finally we can replace our developers with robots! Where do I sign up?
Wrong target.
There is no AI in the world, ever, that can predict what the boss actually wants the new system to do or to predict the unexpected changes he will ask for while you're writing that system.
Re: (Score:2)
Re: (Score:2)
However, I'm pretty sure that this AI works far faster than a human....
Yes, the AI is orders of magnitude faster than a human. It produces a million useless results in a fraction of the time that it takes a human to produce one useful result.
Sign up right here... (Score:3)
Finally we can replace our developers with robots! Where do I sign up?
You can sign up right here in this webapp built by AI. You'll need to fill out this giant form with incoherent fields...also, please ignore the many intermittent HTTP 500 errors when you submit. Also ignore the 5000 JavaScript files downloaded, but enjoy that bonus heat from your device as it warms your room.
Is this anything like... (Score:2)
In contests with at least 5000 participants, the system outperformed 45.7% of programmers.
Is this anything like English schoolchildren outperforming Parliamentarians on standardized tests??
Re: (Score:3)
Oh god no. Even English schoolchildren have some sort of potential for developing self-motivation, self-awareness and being observers. These AI are just fancy statistics crunchers that mechanically home in on solutions algorithmically the way a finger trap slowly tightens in response to wiggling. It's all just static in the air without humans who know to look, looking and assigning some sort of logical meaning to their actions and results.
With how simple analog neurons are (a couple crossed conductors with
Ya, but (Score:3)
Anyone can "write code"; good code on the other hand ...
AlphaCode was able to "achieve approximately human-level performance" in a programming competition.
The phrase "human-level" doesn't really help. :-)
Re: (Score:2)
Right... this can be a ridiculously low bar. I see no shortage of code out there written by five year olds which is better than code written by adults who are supposed to be professionals.
Re:Ya, but (Score:4, Funny)
Right... this can be a ridiculously low bar. I see no shortage of code out there written by five year olds which is better than code written by adults who are supposed to be professionals.
It sounds like you're joking, but, sadly, we both know it's true. :-)
So Skynet doesn't exist yet (Score:3)
Because it wouldn't have allowed this worrying development to be occur.
Or alternatively, it does exist and it's trying to distract us with evidence it doesn't.
Or maybe I'm just paranoid...
Re: (Score:2)
Don't worry... With how simple analog neurons are (a couple crossed conductors with a thin insulator between them) physical reality is likely teaming with them in virtually every medium and across mediums as well. If simply having enough of them wired together magically manifested complex intelligence we'd all be enthralled as part of Ewya (Avatar reference).
Then again I suppose nothing about that prevents an ex machina style future.
Re: So Skynet doesn't exist yet (Score:2)
Take a look at the relatively recent discoveries concerning dendrites. By placing very small antennas very close to de dendrites, resesrchers discovered there are processing gates in the arms of dendrites. Not only does each branch contain many processing blocks, they also communicate betwen eachother with analong signals, over a range of voltages and rates.
Re: (Score:2)
I was referring to analog neurons in terms of neural networks, not the real thing in the human brain. But that is still very relevant information because that means it would take a very substantial artificial neural net to even theoretically replicate the logic functions of a single real dendrite.
Re: So Skynet doesn't exist yet (Score:4, Interesting)
Well, no it doesn't. It's not clear just how much of the human brain is actually devoted to "intelligence". We know that a lot of it is used for things like monitoring blood chemistry, controlling blood pressure, etc. We know that fairly small birds have a basic understanding of arithmetic, so that part can't take up much of the brain. There's an argument that most of the white matter is basically internal wiring. And we know that the intelligence of humans (above that of chimpanzees) developed very quickly, so it probably isn't optimized.
What that does is say that the upper bound on the number of computer cycles needed to emulate a human is a lot higher than many estimates put it at. It doesn't speak at all to what the lower bound is. (Also the structures being used by AIs are very different from those in the brain, and intentionally so. They were inspired by it, but not modeled after it. [Well, at least not at all closely.] So we can't guess whether they are more or less efficient.)
c. elegans has 302 neurons in it's nervous system, and they are all predictably connected. The last I heard we still don't understand how it's controls work (i.e. the algorithm it uses). We *really* aren't modeling any mammalian brain.
The "superintelligent" thing will be new though. (Score:2)
Some experts say such self-improvement could lead to a superintelligent AI that takes over the world.
After Jan 6th in the US and recent similar events in Germany and Peru, it'll have to get in line.
That sound you just heard... (Score:3, Insightful)
... was millions of security people suddenly crying out in terror. And if history repeats itself, they will be suddenly silenced...
Monkeys at a keyboard (Score:4, Insightful)
We're almost there! And we're doing the same thing. Generate Millions of possible solutions, and pick whatever one solves the problem, according to external criteria.
Yep, that's just-about human level. Uh-huh.
Re: (Score:3)
"It was the best of times, it was the blurst of times?!"
Re: (Score:2)
Almost. "It was the best of times, it was the wurst of times"
Re: (Score:2)
Neither of those would pass. Sheesh. "It was the best of thymes, it was the worst of thymes."
Re: (Score:2)
You stupid monkey!
Re: (Score:2)
Re: (Score:2)
So you've seen me code then.
and who writes the test cases? (Score:3)
So who writes these test cases? If it is not the AI itself, then this is very much cheating in my book.
I have to write my own test cases, taken from my understanding of the domain and problem(s) to be solved. And come up with new test cases as new sub-problems reveal themselves. If someone were to to just hand me a test case for every bit of logic in the program, then all I would need do is (very mechanically) satisfy the test cases and I'm done.
Re: (Score:2)
Have you never even looked at a programming competition? They typically provide you with a set of examples, i.e. test cases. It's not cheating at all.
Re: (Score:2, Insightful)
Have you ever actually been a non-1%er programmer?
The VAST majority of "programming" is not actually programming. It's interpreting requirements and considering everything that everyone else neglected to think about. Or it's debugging some random failure that requires you to understand the business, your code, your framework, the random libraries you used, and your underlying OS and machine.
AI would fail 99% of tasks I'm ever assigned. I'm sure it can generate CRUD websites like a boss, though.
Re: (Score:2)
Agreed. Coding competitions looks like schoolwork assignments. Actual development looks nothing like class assignments or coding competition challenges. Hell, even the straightforward crap your real developers would outsource to India requires more intuition and subjective assessment than either of those.
Re: (Score:2)
The practical value of this would be on the compiler end. Because yeah, interpreting requirements is basically already writing pseudocode for the logic. Pseudocode could become an efficient programming language one day for business programming on the other hand which would eliminate a lot of boring work.
Job Security (Score:2)
Let me guess, all in javascript, one-way minification only and it uses spaces instead of tabs.
We're gonna be cleaning this $#!+ out for the next 20 years.
Re: (Score:3)
Let me guess, all in javascript, one-way minification only and it uses spaces instead of tabs.
Nope - in COBOL.
Re: Job Security (Score:2)
Re: Job Security (Score:3)
Spaces vs tabs is so 10 years ago
Not so. While I haven't RTFA I gather that AI assisted coding consists of entering the first character then hitting Tab, Tab, Tab, Tab, Tab x 1,000. to autocomplete.
"Checks and Balances" (Score:2)
The real end of history unless we adapt (Score:2)
Re:The real end of history unless we adapt (Score:4, Insightful)
This is a preview of the inevitable outcome of automation - it is simply the mechanization of intellectual work. All work requires intelligence, but the more manual jobs were mechanized with technology that already existed in the world of people reading this story.
I've been in the workforce for a long time now. Many things I have done over the years have become obsolete. No longer needed, or basically replaced by something else.
I adapted and moved on to new intellectual work. Others don't want to, which isn't uncommon. The desire for stasis is strong in many.
If I might cite an example, We had some photographers who in the 80's and 90's did the standard photo work. Shooting film, and developing and processing photos and Viewgraphs. (large transparencies)
In around 2001, the writing was on the wall - digital was on the way. PowerPoint was going to replace viewgraphs.
We had one photographer at the time, and she wanted no part of digital. So I was tasked with developing a workflow and equipment setup for her to use. She resisted hard against it, even complaining about me being unreasonable, "Trying to force that digital crap on me!"
Her boss just noted I was doing what I was asked to do.
And later that year she was shitcanned, and replaced with a new photographer. Last I heard, she was waiting tables.
Now automation is a bit different than stasis- which is to say, what I call the "lowest intellectual expenditure level" is moving up. We automate the easiest tasks first. As more intellectual expenditure is allowed by automation, those who are affected at that level pretty much have to move up or onto something else.
But there is a problem - at any intellectual level, there are people who are working at their maximum ability. What will they do? I don't profess to know the answer to that.
Did it comment the code ? (Score:2)
and choose good variable names ? These are the hallmarks of really good programmers. Yes: I know that many programmers are bad at commenting :-(
Re: Did it comment the code ? (Score:3)
I don't know about this one, but GTP-chat writes clean code with sensible variable names, language-instructor-type comments and a few paragraphs of extra text describing the overall design of the program. It can actually be a great way to get customized examples of how to use an unfamiliar framework to solve a particular kind of problem.
You can also ask high-level questions about why various parts were included in the code and how they could be changed, and get informative answers. It's like having a patien
Re: (Score:3)
I've written a lot of nasty assembly code with multiple interleaved loops and I can tell you that without extensive commenting there is no way to come back to it a month later and know what the hell is going on.
Re: (Score:2)
I have written plenty of bad code, though. And my comments in the bad code is way more entertaining than the ones on the good code. Because sometimes when you inherit something badly done you don't have time to rewrite it and you just have to paste wallpaper over all the edge cases.
But can it detect the halting problem? (Score:3)
What I want to know is, can machine learning 'detect' that the halting problem exists?
Re: (Score:2)
The halting problem is a mathematical black hole. You *can't* detect that it exists, even if you're in it. All you can do is know that somewhere out there in the space between rational numbers, there be dragons.
I know about uncomputable numbers, like Chaitin's halting probability. So I know that uncomputable numbers exist, even if I can't give their values. The question is, can machine learning arrive at the same conclusions?
Awesome! (Score:2)
What human did they compare to? (Score:2)
Because this sounds very much like they compared to a non-expert like of one of the mass of low-skill coders that mostly produce crap.
Trivial requirements (Score:5, Insightful)
This is pretty cool, but...
a problem might ask for a program to determine the number of binary strings (sequences of zeroes and ones) of length n that don't have any consecutive zeroes.
This is a problem that is very simply stated and trivial in scope. In the real world, building solid requirements is the hardest part of building software. Often, nobody actually knows, in detail, what the requirements should be, when a project starts. They have to be built, just like the code itself.
When AI can write a program that can take on TurboTax, then I'll start to worry about the robots coming for my job!
Re: (Score:2)
Re: (Score:2)
Funny you should bring up the Fibonacci series. I had an interview a few years ago, in which I was given a whiteboard programming test, and my instructions were to write a function that would produce a Fibonacci series. If only I had this AI in my pocket! Actually, no need, I had no problem writing the function.
This could make my job easier (Score:2)
I'd love it if I could have a bot write all the drugework code for me. A lot of times the speed I can code is limited more by my typing speed than anything (and I can do 90 wpm). Intellisense helps a lot, maybe this would be a level above what we have now.
Well, GitHub (Score:3)
Programming? (Score:5, Insightful)
Re: (Score:2)
"Where am I getting it wrong?"
If the AI is somehow able to write bug-free and secure code at a quality / rate that is superior to a human based on the same input, then there's the advantage.
Also you're simply forgetting these things can run 24/7 and they will require less office fit out and HR expense etc to manage.
Re: (Score:3, Interesting)
Re:Programming? (Score:5, Insightful)
You're missing the point. When AI "writes code" based on specs written by a human, that human is the programmer. This is conceptually no different than what we already do whenever we code in anything higher-level than assembler. The C++ compiler interprets what you write and produces the code that runs on the actual hardware. The hypothetical "AI compiler" just allows you to program in English rather than C++.
Re: (Score:2)
Yep - I said the same in another comment. When pseudocode can be compiled efficiently, we save a lot of time. Corporate programming is mostly writing out business rules and attaching it to a UI.
Almost anything useful or original would be novel even in pseudocode or plain English form.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
What you are not taking into account is the training-value efficacy.
Teaching AI by training regimen, discipline and test is self-fulfilling prophesy. Surprise at Deepmind “creativity” developing new solutions beyond the depth of its knowledge base of experience is the “tell”. AI isn’t simply faster, better than human. AI is perfected learning in domains, at this stage.
At the end of the day, AI will code direct-to-hex intelligently skipping its human readable constraint.
Re: (Score:2)
* "existence" following the rules
Re: (Score:3)
This is very true. The hardest part of programming is specifying 'exactly' what needs to be done.
However, most of programming is the nitty gritty details. It might not be assembler, but there's still a lot of nitty gritty detail.
I got my start developing back in the Windows 95 days. Conceptually, let's I just wanted a display a grid of items.
So you had to learn about WC_LISTVIEW and message loops..
Then MFC came, and it it's own more object oriented way of doing it.
Then C#, Java, Javascript/html...
So many di
Re: (Score:2)
How is that different from what we have today?
What we have today is a programmer translating the customer's English-language requirements into computer code.
What this project aims to do is replace that programmer with an AI that translates the customer's English-language requirements into computer code.
Doom can run on a pregnancy test or coffee maker (Score:2)
why is everyone so happy with a Star Trek like code replicator while humans are at least trying to be being creative?
https://www.ign.com/articles/p... [ign.com]
coding clown world has arrived (Score:3)
The actual risk with self-improving code (Score:5, Insightful)
is that when it eventually breaks, no human being has any idea how it's supposed to work or how to fix it.
So more fast but stupid AI (Score:2)
It's like they're trying to recreate a car by making things that run really fast.
We are missing something fundamental about how brains learn, but AI researchers work on these party tricks instead of looking for it.
Psychologists would be working on finding it too, if only they could be made to do science instead of shitty, unreplicated pet projects.
Not a productive use case (Score:2)
I've generally seen them showcase these AI codings against 'challenges', which are generally:
A) Refined requirements
B) Highly repetitive (meaning the verbatim answer problem exists in the training set)
C) Short and sweet
Now one might say it could be useful to have natural language to coding, but generally writing the requirements in prose is more tedious than just writing the code (at least for languages with low tedium/boilerplate).
When pigs can fly... (Score:2)
Re: (Score:2)
I'm such a slacker... (Score:2)
When faced with a coding problem, I've only been creating one program that solves the problem.
I hope I don't have to start creating millions of solutions (even the ones I know are wrong) just so I can compete with AlphaCode...
But maintenance (Score:2)
Generating code that's functionally correct is one thing, but is the code maintainable? Well-factored? Efficient? Does it use coding best practices so others on the team can work with it?
Re: (Score:2)
Re: (Score:2)
...but is the code maintainable?
Here's how this will go:
1) Customer submits a vague request, code is generated that isn't even remotely close to what the customer wants.
2) Customer submits a lengthy request (a few printed pages), including sketches of the desired user interfaces, code is generated that isn't even remotely close to what the customer wants.
this repeats until the customer submits hundreds of pages of vaguely worded and drawn "specifications", and code is generated the isn't even close to what the customer wants.
x) Customer
Oh shit (Score:2)
Good luck with that. (Score:2)
Human level performance? (Score:2)
I demand proof! I want to see the security hole through which my credit card info escapes.
type (Score:2)
They mean "Dunning."