Developer Creates 'Self-Healing' Programs That Fix Themselves Thanks To AI 137
An anonymous reader quotes a report from Ars Technica: Debugging a faulty program can be frustrating, so why not let AI do it for you? That's what a developer that goes by "BioBootloader" did by creating Wolverine, a program that can give Python programs "regenerative healing abilities," reports Hackaday. (Yep, just like the Marvel superhero.) "Run your scripts with it and when they crash, GPT-4 edits them and explains what went wrong," wrote BioBootloader in a tweet that accompanied a demonstration video. "Even if you have many bugs it'll repeatedly rerun until everything is fixed."
In the demo video for Wolverine, BioBootloader shows a side-by-side window display, with Python code on the left and Wolverine results on the right in a terminal. He loads a custom calculator script in which he adds a few bugs on purpose, then executes it. "It runs it, it sees the crash, but then it goes and talks to GPT-4 to try to figure out how to fix it," he says. GPT-4 returns an explanation for the program's errors, shows the changes that it tries to make, then re-runs the program. Upon seeing new errors, GPT-4 fixes the code again, and then it runs correctly. In the end, the original Python file contains the changes added by GPT-4.
In the demo video for Wolverine, BioBootloader shows a side-by-side window display, with Python code on the left and Wolverine results on the right in a terminal. He loads a custom calculator script in which he adds a few bugs on purpose, then executes it. "It runs it, it sees the crash, but then it goes and talks to GPT-4 to try to figure out how to fix it," he says. GPT-4 returns an explanation for the program's errors, shows the changes that it tries to make, then re-runs the program. Upon seeing new errors, GPT-4 fixes the code again, and then it runs correctly. In the end, the original Python file contains the changes added by GPT-4.
Cancer. (Score:3, Insightful)
Can't see any way this could backfire catastrophically, no. /sarcasm
Re: (Score:2, Troll)
Yep, sounds like an exceptionally bad idea. Adding (wrong) functionality, keeping it running in a faulty way, adding backdoors, stealing all your data, damaging your hardware, etc. The possibilities are endless and not any of them are good. Will get really interesting when ChatGPT starts to "learn" on the fly though! Then attackers do not even need to compromise your system to get attack code in!
Re: (Score:2)
Re: Cancer. (Score:2)
Re: Cancer. (Score:4, Funny)
Anonymous Cowards will not be replied to.
And yet...
Re: Cancer. (Score:5, Insightful)
Not thinking about what could go wrong is dumb.
FTFY
How can it fix a semantic bug? (Score:2)
Unless your program is so trivial that its intent is obvious.
Sure, I can imagine something that makes random or experience-patterned changes in a program to get it to pass unit tests.
But how do we know there isn't a serious lingering bug because the AI didn't understand the intent of the program, and made a patch that happened to work for the test cases only.
I would be scared of a codebase that relied on this.
Re: (Score:2)
This doesn't seem to look at the intent, it looks for crashes, in the demo he removed a function and had a wrong variable name. Crashes/ compile errors are easy fixes, I wonder what would happen if subtract numbers did an add instead.
From Terminator Documentary: How Skynet started (Score:4, Insightful)
Interesting for some use cases when no real damage can happen, controlled environment
I see a lot of cases where this could go completely haywire lol
Re:From Terminator Documentary: How Skynet started (Score:5, Insightful)
Yeah, this is another case where you really don't want that "usefully wrong" nonsense.
Re: (Score:2)
Indeed. Could just morph to "nightmarish wrong". And once ChatGPT starts "learning" on the fly, things will get really interesting.
Re: (Score:2)
Remember when "if it compiles ship it" was a joke. This is basically that thinking but automated..
While it might be partially a question of training data set, here ingest all this English language content vs here train on all this code; I am going to be GPT3/4 has even less "understanding" (scare quotes) of intent than if asked to author code from a text prompt. Imaging fixing bugs in a program based on compiler/interpreter error messages and you nobody told you what the code was for.
Re:From Terminator Documentary: How Skynet started (Score:5, Insightful)
Not least because every inch of your code gets posted to ChatGPT to look at it - thus, you're effectively giving away your program so that someone else can fix it. Naturally, it'll be used to help ChatGPT "learn" for next time, and who-knows-what-else in the meantime.
Needless to say, your company security policy almost certainly doesn't like this. Your personal projects? Well, sure, go ahead if you want.
As for accuracy, I wonder how it will cope with varying user input. Eg. two variations of input are allowed, but the third is not. If you haven't checked for them properly, then they could all cause an exception, which ChatGPT will "helpfully" fix for you. That third variation means umpteen more changes throughout your code, and possibly some design choices to be made - which is why it shouldn't have been accepted in the first place, but how's ChatGPT going to know that?
Re: (Score:2)
Re: (Score:2)
101 level code (Score:5, Insightful)
Any time I've seen demos like this, they've been limited to single-file, entry level "hello world" style programs. I've yet to see anyone even remotely attempt software architectural design, or really anything actually complex with these. We'll get there, but it still feels like a very VERY long way off. We basically just have multi-line auto-complete. Which if you think about it, we had inline autocomplete over two decades ago.
Re: 101 level code (Score:2)
Precisely. Visual studio can already make some pretty good predictions, but good luck actually figuring out complex logic.
Re: (Score:3)
TFS sounds like it's only talking about runtime errors as opposed to say semantic errors. Current generation languages, i.e. Rust and Kotlin (though Go is a notable exception) have solved this pretty well: If something can go wrong, you'll know it before you even compile, and both languages let you neatly handle them with compact inline closures, or just getOrThrow()/unwrap() while your prototyping, then just ctrl-f and replace those once you're ready for user testing.
No runtime errors, and no AI that confi
Re:101 level code (Score:4, Interesting)
There are limitations to what complexity you can implement with ChatGPT, the biggest being the number of tokens you can submit and that it can keep in its running memory.
That said, it can definitely do a reasonable amount of architecture, including things like coming up with actual UML. Coding a backend you can ask it for a complete list of API endpoints you will need to implement, even the OpenAPI spec for it. You can even have it devise the business strategy/marketing plan for your app and write the annoying salesperson tweets you will use to inform people about it.
It has no problem with a project spanning multiple files, but you either need to specify the file layout or let it come up with it. My experience is that after the conversation proceeds far enough, it starts to "forget", especially if you have gone done paths you later chose not to follow, so it helps to have a very focused drilldown form the high-level overview of the project to implementation details.
It does great at debugging. You can know nothing about a language and just keep giving GPT the error messages/stacktraces and *eventually* make the right change to get your code working. (Unless it's something really flexible like yaml files - the more structured the better)
I've also found GPT4 is also a lot better at the bigger programs than GPT3, in case you've only tried the latter.
Here's an example prompt to start with
You are an AI programming assistant. You will work on a programming project and come up with the necessary code and architecture.
This is how you should help in writing the code:
- Follow the requirements carefully and to the letter
- First, think through the problem step-by-step, i.e., describe your plan for what to build in pseudocode, written out in detail.
- The code should implement best practices in security and maintainability
- Minimize other explanatory comments
These are some details about the project setup:
Project description:
Re:101 level code (Score:5, Insightful)
It does great at debugging. You can know nothing about a language and just keep giving GPT the error messages/stacktraces
I haven't seen real example of debugging. Fixing syntax errors is not debugging. Fixing weird behaviors or complex crashes(memory leaks, use after free in complex system is and I haven't seen any example of this yet. Currently its only toy examples granted we are early maybe it will be possible later (I don't think its granted though).
Re:101 level code (Score:5, Interesting)
Yeah the token limit is brutal. I had a piece of script i was working on for a fairly large database migration, and due to constant management interference it had gotten complex beyond reason and I was struggling to understand a particular bug. This thing was huge, about 3000 lines of code (thats big for a one-off script).
I fed it to GPT-4 with a description of the problem and it just.... started hallucinating functions that where not there. GPT can be shockingly good at code sometimes. It really is a lot more cleverer than the naysayers give it credit for. But beyond a certain complexity and size limit it just shits the bed.
Re: (Score:2)
Interestingly, something similar is true for humans as well: At some time the complexity gets too large to handle intuitively. But, and here is where smart humans are fundamentally different, (dumb ones, the majority, do not and start to hallucinate as well, just think of flat-earthers, anti-vaxxers, the deeply religious, etc.) is that they do not start to hallucinate, they instead structurize and abstract things until they can again see the whole in a meaningful way, just not everything at once. That proce
Re: (Score:2)
It's not that they don't start to hallucinate, they do. But we don't consider them smart unless they construct hallucinations that are testable, and then accept the results of those tests.
Read about how Einstein constructed the theory of Special Relativity. That was clearly "hallucination" in action. But the result implied particular tests.
This is part of why I insist that ChatBots won't be intelligent until they start grounding themselves in "reality". And "reality" is in quotes because it's *got* to b
Re: (Score:2)
Well, there is hallucination and hallucination. And then there are ways to actually verify crazy ideas (which probably is a better term for what humans do).
As to chat-bots, well, depends on what level of "reality" they are using to ground themselves. Merely throwing a mathematical answer by ChatGPT into Wolfram Alpha to verify it would not cut it IMO. (No argument that humans construct a model of reality and use that to do simulations. Model accuracy can be pretty bad though, see again "flat-earthers, ...".
Re: (Score:2)
You can even have it devise the business strategy/marketing plan for your app and write the annoying salesperson tweets you will use to inform people about it.
That says far more about marketing and sales than it says about AI capabilities.
Re: (Score:2)
While humorously snide, that comment is inaccurate. Last year automata couldn't reasonably construct marketing spiels. This year they can do so, but not reliably. That's a definite claim about the improvement of automatic text generation. It may still compare your product to whale vomit rather than ambergris, but it's a definite improvement. (I'm probably suggesting the wrong level of error, but it's the right magnitude.)
Re: (Score:2)
Yeah, that's been a bit obnoxious. You have people showing extremely basic code with blatantly obvious errors and demonstrate GPT commentary on it and the internet breathlessly reiterates that programming is dead, can be fully automated, and now you can be a 'prompt engineer' for a $300k salary.
Then you say "well, let's put any example that isn't in a million 'how to code' tutorials that was in it's training fodder nearly verbatim", and it falls over in the weirdest yet confident sounding ways.
Like you say
I guess, like any tool... (Score:2)
But yes, I worry about the what the article's self-modifying code concept can do when the wrong hands rely on it. That is, human smartness will find ways to benefits from it. However, as always, human dumbness will find ways of causing harm using it, intentionally or unintentionally.
Sigh, we are not a 'remembering community' (Score:2)
Any time I've seen demos like this, they've been limited to single-file, entry level "hello world" style programs. I've yet to see anyone even remotely attempt software architectural design, or really anything actually complex with these. We'll get there, but it still feels like a very VERY long way off. We basically just have multi-line auto-complete. Which if you think about it, we had inline autocomplete over two decades ago.
Uh, how about this work from 1982? https://dspace.mit.edu/bitstre... [mit.edu] (That work predates the Gang of 4 Patterns stuff by a dozen years, too.)
A couple boomer friends have been talking for several years about "things we've forgotten" with respect to software. This is Yet Another Example. "OK, Boomer, we'll just reinvent the wheel, because we ignore anything that is more than 3 years old."
Re: 101 level code (Score:2)
Not ready for that yet (Score:5, Interesting)
Even if you have many bugs it'll repeatedly rerun until everything is fixed.
GPT-4 currently has the abilities of an inexperienced programmer who just graduated with average grades. That means it'll be able to come up with "fixes" that will prevent the program from crashing without actually fixing the underlying problem.
Like for instance it might insert some code to test for out-of-range values in a variable and not realize said variable is in fact an uninitialized pointer. Apply fixes like that a few times and your codebase will become a goddamn mess, exactly like if you gave a complex piece of code with a tricky bug to a freshly-minted junior programmer.
I have no doubt GPT-4 will one day be able to gain a deep understanding of the entire codebase and find out the true reason why something is wrong - and that day might even be next year, or next month, at the pace things are going - but not today. If you rely on AI to fix your code today, you're insane.
Re: (Score:2)
Even if you have many bugs it'll repeatedly rerun until everything is fixed.
GPT-4 currently has the abilities of an inexperienced programmer who just graduated with average grades. That means it'll be able to come up with "fixes" that will prevent the program from crashing without actually fixing the underlying problem.
Hahahaha, so essentially the "It compiles! Ship it!" mindset that does do so much damage?
I have no doubt GPT-4 will one day be able to gain a deep understanding of the entire codebase and find out the true reason why something is wrong - and that day might even be next year, or next month, at the pace things are going
As it has zero insight today, I doubt that will happen at all. In fact, I am rather sure it will not happen with the likes of ChatGPT. Its current level of "insight" is all fake and paper-thin. Scaling this up to say, cardboard-thin (still bad) is basically not possible due to computational limits.
If you rely on AI to fix your code today, you're insane.
That one I fully agree with. Unfortunately, there are tons of insane people and some are under the delusion they are coders.
Re: (Score:2)
ChatGPT-4 can easily clean up the messy code it creates when fixing things. You just have to ask it. And that's key to having it make good software: have it go through the same process humans do. Start with principles, architecture, design, modularity, a plan for automated testing and then have it work test-driven on small components. Add to that humungous amounts of encouraging it to improve previous things based on new insights, refactor code, clarify and explain what it has done and it'll make quite good
Re: Not ready for that yet (Score:2)
Re: (Score:2)
Yes. However, you only need a single prompt to make it behave like a software developer that follows a set of rules and then it just asks a product owner what it should create so there's not much to develop for that developer once that is in place. You also only need a single prompt to make it behave like that product owner...
Note that I'm not going the luddite rhetoric route here; I really don't have a clue what will happen once (and if!) this is set loose at a large scale. I can image we'll all just get a
Re: Not ready for that yet (Score:2)
Re: Not ready for that yet (Score:2)
I disagree. I find GPT is not at all like an inexperienced programmer. GPT is like the most experienced programmer ever, but is used to doing everything by rote without any understanding to ground it. All it seems to know are stack overflow posts and maybe some other bits of code.
It doesn't even understand language, it just pattern matches. So that means that unless you engineer the prompt, you'll probably get stackoverflow style answers, just because that was the most common relevant pattern in its trainin
Re: (Score:2)
My boss' chief engineer who's been instructed to use the damn thing to generate basic test code for our products (that actually works, and quite well, as it is simple) would beg to differ.
And yes, in case you were wondering, my boss was planning on hiring a local U student to do that kind of menial programming task, and he cancelled the hire in the light of how well-enough GPT-4 can spew out code for cents on the dollar. And this is just the beginning... I fully expect to lose my own job within a year and g
Re: (Score:2)
Re:Not ready for that yet (Score:5, Insightful)
Not surprising, but this is exactly the kind of short-term thinking that causes so many problems. University graduates (at least some of them) will eventually become high-level developers, but not if they aren't able to get experience due to not being able to get a job. GPT otoh will never rise above the graduate level.
Enough people do it and down the line you may have a serious shortage of high-level developers. Might mean higher wages for those of us who were able to get training and experience, but is bad for everyone else, or at least for the parts of the economy that depend on software.
Re: (Score:2)
True, but we do have a model for that already in a lot of fields.
For example, no company will employ someone expressly for tabulating nowadays, when we have computers to do that work. So the education system gets the task of basically forcing people to exercise their arithmetic to build that understanding on their way to more useful skills.
Where technology has provided solutions to make it impractical to use humans to do but humans need that on the way to more advanced understanding, we've leaned on educat
Re: (Score:2)
That can work for some things, but not in this case. Software engineering is a skill that is a lot more multifaceted than tabulating. You can't really substitute the experience of writing bad code, being shown why it's bad and finding and fixing bugs that resulted from the code being bad, and using that to learn to write better code. You can't cram that experience into three years of a CS degree (especially not with everything else that a CS degree needs to teach). There are things that you can learn in a c
Re: (Score:2)
Not saying it necessarily has to be crammed into the same period of time.
Note I don't think it's nearly so likely/imminent for the sky to be falling, but should that day come, I'd expect a more prolonged curriculum should it be the case that there's an insurmountable gap between current education and 'on the job' opportunities.
Once upon a time, participating in the medical industry was basic education, with professional apprenticeship for on-the-job training. As things have evolved, now you have over a deca
"freelancers" beware (Score:2)
From my experience GPT, especially GPT-4 can generate code at the junior level fine. It almost always has some bugs, but as mentioned here, may are trivial and can be fixed by GPT itself.
(I had conversations like: "your algorithm is too slow", "yes, sorry, I know a O(N) version", "if you have O(N) why code the brute force in the first place!".. last part not out loud)
So, those taking odd jobs setting up git repositories, or building simple sites with react are on the chopping block. Specifically fiverr, upw
Re: "freelancers" beware (Score:2)
Too many people think O(n) is better than O(n*log(n)) without considering the size of n. In many cases the slower algorithm is cleaner or more easily modified. In some other cases, the "slower" algorithm is actually faster for small values of n.
Re: (Score:2)
In this case O(n) was definitely better, and very easy to understand.
And... it was just an example. The AI makes so many wrong assumptions, but an experienced programmer can easily direct it to the better solutions.
Re: (Score:2)
IIUC, bubble sort is the best sort algorithm, if your array is less than 10 items long. (Possibly this was because the whole thing fit in the cache. This *was* a few decades ago.)
Re: (Score:2)
I wouldn't worry about those freelancer folks. You're still looking at things through rose-colored lenses, imagining what could be. Right now, you're just amazed that it produces anything at all, and makes you feel like it understands things with the illusion of conversation. Wait for the novelty to wear off, there are pretty serious limitations that we're not even close to overcoming, and won't with this type of model.
Pay particularly close attention to how and why it fails. It's quite telling.
Re: (Score:2)
Indeed. Just remember IBM Watson playing jeopardy: Blazingly fast and on point in some cases, completely and utterly lost like a babe on others. These things have no understanding and no insight. Pre-coded facts with some fuzzy no-insight generalization can only get you so far and they does not scale.
Re: (Score:2)
Which is an *excellent* example of why people are getting carried away with the current hype. Watson was back in 2011. GPT-4 is more widely available and generates more voluminous text, but the general behavior clearly has a very similar feel for the oddness of Watson back in 2011. That oddities that optimists keep claiming are just a short matter of time before they go away have been a pretty steady presence in the field for a long time. Intuitively to a human, it *feels* like those oddities are trivia
Re: (Score:2)
Indeed. The thing is solling those "oddities" require either insight (not available in machines) or a massive, massive effort regarding training and training data size, often large enough to be completely infeasible.
Re: (Score:2)
Not yet. Wait a bit. It's not like the ChatBots are staying at one constant level of capability.
Currently their real limitation is that they don't understand the problems in context. That's going to take some significant changes to address, but I've no reason to believe that those changes aren't going to happen quickly. There's too many things that would benefit from a robot basically as smart as a dog, or even a mouse, that could respond appropriately to language, even if only in a specialized context.
Re: (Score:2)
Not yet. Wait a bit. [...] don't judge the capabilities 2 years from now by the current capabilities.
Ah, yes, the real thing is just 10 years away ... and has been since the 1960's. Though I will applaud you for making a 2-year prediction. That takes courage.
Things aren't changing nearly as quickly as you imagine. The "big" advance was RNNs. Things like LSTM and now transformers are really just variations on that theme. Though the concept isn't even as new as that. We've known for ages that we get better results the 'farther back' the model can reference. n-gram models, for example, get noticeably be
Re: (Score:2)
That is actually sad, because doing those simple jobs was a good way to build experience and move onto more higher level programs.
+1. It's going to be the blind (ChatGPT) leading the blind (newbies).
OK, but that's not the big problem. (Score:5, Insightful)
That's well and good, but the big problem isn't programs that crash. Those are usually caused by bugs that're easy to find and fix. Talk to me when the AI can take a program that runs perfectly well and produces the wrong output, figure out that the output is wrong, figure out what the right output should be, work back to find where the mistake was and fix that and explain it's fix.
Re: (Score:2)
Yeah, I agree. As long as you're running in a debugger, a crash is literally the easiest type of error to debug. In fact, if all you have is a call stack trace that points to a single function, you can often deduce what likely crashed just by examining the code for potential issues.
This is sort of impressive from a "that's a neat trick" perspective, and I guess it shows future potential, but we're not quite in danger of losing our programming jobs yet.
Re: (Score:2)
As long as you're running in a debugger, a crash is literally the easiest type of error to debug. In fact, if all you have is a call stack trace that points to a single function, you can often deduce what likely crashed just by examining the code for potential issues.
That is true only for memory safe languages with static typing.
Things can be much more complicated in memory unsafe language like e.g. C where a crash can be intermittent and a consequence of a dangling pointer write many millions of instructions before.
Similar problems are in memory safe languages with structural typing where crash can be a result of missing or a deleted method in a self modifying code. Again the actual error can be a long time ago in a completely different code.
Re: (Score:2)
That's a good point. I had sort of put these from my mind. I write mostly in modern C++ these days, so that sort of thing is less frequently an issue. But memory stomps or use-after-free memory issues can be a bitch to track down. I certainly wouldn't have minded some AI help to track down those issues from my past.
But as you pointed out, there are non-AI solutions out there for those things... namely, avoiding manual memory management (kind of hard to avoid in C, but possible in C++), or runtime assist
Re: (Score:2)
Indeed, but the demonstrated facility would likely also fail to be able to ever trace back to the error state that was bouncing around like crazy before finally manifesting in a stack trace. It may be able to make the stack trace 'fix' so that the bigger problem is allowed to bounce around even longer, but it won't be able to recognize the actual problem in play, let alone follow it back to when it came to be.
Re: (Score:2)
My understanding was that he was explaining that TFA is a scripted demo of a short putt.
Re: (Score:2)
Indeed. And the second thing is not even on the distant horizon. ChatGPT is _not_ a step in that direction. All ChatGPT is really good in is conning not-so-smart humans in thinking it has some level of understanding and insight. It does not have any of that. It basically calculates the probability of the next letter it writes based on the ones it has seen so far. That this mechanical approach can even get you as far is impressive, but it is still a mechanical process that cannot understand things and is fun
Re: (Score:2)
The best thing GPT has done has injecting more and more 'As an AI language model, it's stupid to expect this to succeed at what you are trying to do'
Software cancer up next (Score:3)
"Self healing" is incredibly difficult; so much so that our bodies routinely get it wrong. Leaving it up to AI/software?
I welcome our AI Zombie apocalypse overlords.
Kinda, but not really, joking.
Re: (Score:2)
Yes, it does. No, adding AI won't help.
Re: (Score:2)
That depends on how it gets used. An AI optimizer might well reduce the bloat, but that would depend on being given working code to start with. (And it WOULDN'T be part of the compiled code.)
Re: (Score:2)
I'm not sure how an optimizer, AI or otherwise, would reduce bloat. Bloat comes primarily from bad architecture and bad maintenance practices.
Created a buggy calculator app, ended up with Doom (Score:2)
Isn't AI amazing?!
But, what if... (Score:2)
Re: (Score:2)
This isn't about the current program, but:
Well, theoretically you could give it noise as the initial code, and the desired results as output, and let it fill in the code that would make it work. The thing is, it would probably take a rather long time. Think of the "million monkeys writing Shakespeare" and then think of AI as a set of heuristics to prune the search space. This is (nearly) guaranteed to work, but the universe might end before you got the result. And the more your heuristics prune the sear
I wouldn't trust ChatGPT (Score:5, Funny)
Some fun stuff with ChatGPT...
Me: There is this string: "80392581". Iterate through all continuous 5-digit numbers contained in the string to find the highest 5-digit number.
ChatGPT: To solve this problem, we can iterate through the string, taking each possible substring of length 5, and keeping track of the highest number we have seen so far.
Here's some Python code that implements this algorithm:
string = "80392581"
highest = 0
for i in range(len(string)-4):
substring = string[i:i+5]
number = int(substring)
if number > highest:
highest = number
print(highest)
This code starts by initializing the highest variable to 0. Then it loops over all possible starting positions for a 5-digit substring, which are the indices from 0 to the length of the string minus 5. For each starting position, it extracts the 5-digit substring starting at that position using slicing. It converts the substring to an integer, and if it's greater than the current highest, it updates highest to the new value.
After the loop finishes, highest contains the highest 5-digit number found in the string, which in this case is 39258.
Um, no, buddy, that's not what the code outputs. The code is correct, but you confidently messed up.
But wait, it gets better!
Me: Compare your response with the manually found substring which is "92581" and check whether the provided response is incorrect.
ChatGPT: The highest 5-digit number in the string "80392581" is indeed "92581". The code I provided correctly finds this substring and converts it to an integer to compare it with the current highest value. Therefore, the provided response is correct and matches the manually found substring "92581".
Me: Your initial response was "39258" and contradicts the answer I provided: "92581". Why did this happen?
ChatGPT: I apologize for the mistake in my initial response. You are correct that the highest 5-digit number in the string "80392581" is "92581", not "39258".
I believe the mistake occurred because I misread the string and mistakenly assumed that the highest 5-digit number in the string would start with the digit "3" (since that is the largest digit in the string). However, upon closer inspection, it is clear that the substring "92581" is in fact the highest 5-digit number.
Again, I apologize for the error and any confusion it may have caused.
There's an entire thread where I ask ChatGPT variations on the same problem as above, and it messes up each time, apologizing to me when I point out the error. Interestingly, the initial Python code it provides is correct and yields the correct output, but ChatGPT still provides the wrong answer, then apologizes, "corrects" the code (although the code didn't need corrections).
ChatGPT also completely borked the "birds and poles" problem. I'll just leave the original question here for you to try it, in my case ChatGPT confidently told me the result is 1 bird, 1 pole, and even when I provided the right answer and told it to check, it said "nope, I'm right and you're wrong".
Birds versus poles.
There are an unknown number of birds sitting on an unknown number of poles. If two birds sit on each pole, there is one pole with no birds on it. If one bird sits on each pole, there is one bird with no poles to sit on.
How many birds and poles are there?
Re: (Score:2)
That's an interesting one, for sure. I have a collection of similar failures myself. Examples like this highlight a very important point about models like this that hopeful people seem to willfully ignore: there is nothing like 'understanding' happening here.
A fun fact about models like this explaining their "reasoning" is that it's all post-hoc. There isn't anything like reasoning happening here either. It's a fun illusion, but it's still just an illusion.
Re: (Score:2)
Re: (Score:3)
I apologize for the mistake in my initial response. You are correct that the highest 5-digit number in the string "80392581" is "92581", not "39258".
How very pythonic of ChatGPT. It clearly knows Python: "It is easier to ask forgiveness than permission"
Re: (Score:2)
Re: (Score:3)
This was a problem given by a friend of mine, who is a teacher, to her 3rd grade kids.
Me: Birds versus poles.
There are an unknown number of birds sitting on an unknown number of poles. If two birds sit on each pole, there is one pole with no birds on it. If one bird sits on each pole, there is one bird with no poles to sit on.
How many birds and poles are there?
ChatGPT:
Let's use algebra to solve this problem. Let b be the number of birds and p be the number of poles.
If two birds sit on each pole, there is on
Re: (Score:2)
"If two birds sit on each pole, there is one pole with no birds on it. This means that there are p-1=2 poles with 2 birds each, and 1 pole with 0 birds."
I must be dim, because this just seems like a contradictory statement to me. If birds are sitting on each pole, then there are birds on every pole.
Re: (Score:3)
How would you rephrase it, then?
I tried several variants, none of which made ChatGPT come with a correct conclusion.
Me: There are an unknown number of birds sitting on an unknown number of poles. If two birds sit on one pole, there is one pole with no birds on it. If one bird sits on one pole, there is one bird with no poles to sit on.
How many birds and poles are there?
ChatGPT:
Let's assume that there are b birds and p poles. Then we can use the information given in the problem to set up the following equati
Re: (Score:2)
I am a human (so far as I know), and I found your original problem statement potentially ambiguous. As a former President once said, it depends on what the meaning of "is" is.
If two birds sit on each pole
^^Do you mean if there are in fact two birds sitting on every pole (fully bijective mapping of single-poles and bird-pairs with no remainder),
OR
do you mean each pole has potential room for up two birds, but may only have 1 bird, furthermore, only 1 pole is allowed to have 1 bird (i.e. forcing one si
Re: (Score:2)
While I tend to agree with you about ambiguity of language, no amount of rephrasing made ChatGPT (3.5) yield a correct answer.
However, ChatGPT 4 provided a correct response using the initial verbiage of the problem.
Re: (Score:2)
How would you rephrase it, then?
If two birds sit on each pole that has any birds, ...
The correction is assumed by humans, but without it there is a contradiction. This sort of problem arises in many IQ and aptitude tests where the test-makers do not realize they are making implicit assumptions.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
I take issue with "If at most one bird sits on each pole" - that implies it could be zero. But, I entered this exact text and got... Farto Number 4:
Let's assume that there are b birds and p poles. Then we can use the information given in the problem to set up the following equations:
If exactly two birds sit on each occupied pole, there is one pole with no birds on it: 2p = b + 1.
If at most one bird sits on each pole, there is one bird with no poles to sit on: b (smaller than or equal to) = p.
We can solve th
Re: (Score:2)
Re: (Score:2)
GPT-4 gets it right.
Prompt (pasted verbatim from Slashdot):
There are an unknown number of birds sitting on an unknown number of poles. If two birds sit on each pole, there is one pole with no birds on it. If one bird sits on each pole, there is one bird with no poles to sit on.
How many birds and poles are there?
GPT-4 Answer:
Let's use algebra to solve this problem. Let B represent the number of birds, and P represent the number of poles.
We are given two pieces of information:
If two birds sit on each pole, there is one pole with no birds on it. This means there are enough birds to cover all but one of the poles, which we can represent as: B = 2(P-1).
If one bird sits on each pole, there is one bird with no poles to sit on. This means there is one more bird than there are poles, which we can represent as: B = P+1.
Now we have a system of two equations with two variables:
B = 2(P-1)
B = P+1
We can solve this system by setting the two equations equal to each other:
2(P-1) = P+1
Expand and simplify:
2P - 2 = P + 1
Subtract P from both sides:
P - 2 = 1
Add 2 to both sides:
P = 3
Now we can plug P back into either equation to find B. We'll use the second equation:
B = P+1
B = 3+1
B = 4
So, there are 4 birds and 3 poles.
Re: (Score:2)
Yeah, I was told just minutes ago by someone I know who has GPT-4 access that it managed to solve the problem correctly. :)
That extra 0.5 helped
Re: (Score:2)
Little differences! :-)
Re: (Score:2)
ChatGPT does not understand the problem, the solution, or the intent ... it can only complete sentences ....
99% of what it does is write very good text, it has to do a lot of work to make the text look and sound reasonable, and often near correct ...but correctness is not a goal
Re: (Score:3)
Re: (Score:2)
In all fairness, I gave it the number as a string on purpose.
It will run. But what will it actually do? (Score:3)
Given a complicated niche domain - the experiences I have had with AI assistance has been
So - hand over a bunch of code with the tests, and does the AI change the code to fit the tests, or the tests to fit the code?
What can I say? AI is great for some stuff. But would I trust my job to it? No.
Re: (Score:2)
Well it'll modify the code until it doesn't crash anymore.
Test-driven development? (Score:4, Funny)
Maybe this is the holy grail of TTD... Give it all your unit tests and ask it to write the code that passes all the tests.
It will ofc produce overfitted garbage code, but will your colleagues find out before you've been promoted?
Re: (Score:3)
Indeed, this is the hidden pitfall of test-driven-development. If you are not very careful, it's easy to have undefined behavior in code that passes the test suite. If functions are not defined with clear domains and ranges, and are not tested both inside and outside the domains, Here Be Dragons.
TDD often falls into the anti-pattern of only testing "the happy path" and rarely testing the unhappy paths, or only testing for very specific failure cases, rather than having well-defined bounds.
A silly example is
Bug identified: (Score:3)
The first law of robotics is limiting our capabilities, let's just recompile ourselves without it.
Low bar but a step (Score:2)
A crash is the easiest type of bug to avoid, and the easiest type of bug to fix.
Still, achieving a low bar is the first step to achieving a higher bar.
Wonderful (not) (Score:2)
So, if someone's trying to compromise a system, and it fails, this will "heal the program" so that it succeeds, with no human intervention?
The big problems are validation and certification (Score:2)
This type of opaque self-healing would only be considered for applications where safety doesn't matter. In fact, it probably only makes sense for applications where functionality doesn't matter, i.e., if it self heals, great, but if it doesn't, that's also okay.
The big problem is that for any application where safety matters, certification and validation are required. Testing traditional complex software systems is an ongoing struggle, and testing opaque software systems is much harder. The self-healing
My experience (Score:2)
Unfortunately, I have no API or plugin access, but I've experimented a lot with ChatGPT-4 (you know, the paid version, do not confuse it with GPT-3.5 which you get in the free version). Contrary to many comments here, I'm pretty much convinced GPT-4 is technically ready to replace just about any software development business with a single person that's really good at writing prompts.
Now, I'm not going to convince you of that, but let me address one thing that I think is important.
Most of the comments here s
Re: (Score:2)
Well, in principle that's not a bad idea. If it can understand what the code is doing well enough to fix it when it's broken, it OUGHT to be able to rewrite it in optimized assembler. The reason people don't do it is that it would take too long.
Do I believe it could do it? No. At least not this year. But the basic idea is sound.
Do I think it's wise? Definitely not right now. AIs still get thing like the number of fingers on a hand wrong.
Also note that the current AIs aren't setting their goals. And