Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
AI Programming

27-Year-Old EXE Became Python In Minutes. Is AI-Assisted Reverse Engineering Next? (adafruit.com) 121

Adafruit managing director Phillip Torrone (also long-time Slashdot reader ptorrone) shared an interesting blog post. They'd spotted a Reddit post "detailing how someone took a 27-year-old visual basic EXE file, fed it to Claude 3.7, and watched as it reverse-engineered the program and rewrote it in Python." It was an old Visual Basic 4 program they had written in 1997. Running a VB4 exe in 2024 can be a real yak-shaving compatibility nightmare, chasing down outdated DLLs and messy workarounds. So! OP decided to upload the exe to Claude 3.7 with this request:

"Can you tell me how to get this file running? It'd be nice to convert it to Python.">

Claude 3.7 analyzed the binary, extracted the VB 'tokens' (VB is not a fully-machine-code-compiled language which makes this task a lot easier than something from C/C++), identified UI elements, and even extracted sound files. Then, it generated a complete Python equivalent using Pygame. According to the author, the code worked on the first try and the entire process took less than five minutes...

Torrone speculates on what this might mean. "Old business applications and games could be modernized without needing the original source code... Tools like Claude might make decompilation and software archaeology a lot easier: proprietary binaries from dead platforms could get a new life in open-source too."

And maybe Archive.org could even add an LLM "to do this on the fly!"

27-Year-Old EXE Became Python In Minutes. Is AI-Assisted Reverse Engineering Next?

Comments Filter:
  • by GFS666 ( 6452674 ) on Saturday March 01, 2025 @11:50PM (#65204467)
    The program worked in one work environment. What about all the edge cases that the old program was tested for and had to successfully work in before it was published? I doubt that the "new" program was fully tested to ensure that it meets all of the compatibility of the old program.
    • by Waffle Iron ( 339739 ) on Sunday March 02, 2025 @01:44AM (#65204595)

      The original program was probably never tested for edge cases either.

    • They'd spotted a Reddit post "detailing how someone took a 27-year-old visual basic EXE file, fed it to Claude 3.7, and watched as it reverse-engineered the program and rewrote it in Python."

      ITYM:

      They'd spotted a Reddit post "detailing how someone took a 27-year-old visual basic EXE file, fed it to Claude 3.7, and watched as it reverse-engineered the program and rewrote it in apparently equivalent at a first glance Python."

    • by gweihir ( 88907 )

      It is a small script to entertain 2 year olds. If it does total crap the worst you get is a bored kid.

      • by narcc ( 412956 ) on Sunday March 02, 2025 @02:20PM (#65205435) Journal

        Oh, but it's so much less than even that!

        Like all sensational claims about the wonders of AI, this was falls apart quickly under scrutiny.

        No, Claude 3.7 did not reverse-engineer a 27-year-old executable. Strings were extracted from the binary, yes, though likely as a normal part of preprocessing. Specific functionality was added with additional prompting, which is why there are six different python scripts in the output.

        No, Claude 3.7 did not "extract sound files". All it "extracted" were a few file names, which are just strings.

        No, this isn't going to lead to some future where "Old business applications and games could be modernized without needing the original source code" because no reverse engineering was done here at all.

        This is about as impressive as those silly toy examples of an LLM "making" a simple game, similar to countless other little games it was trained on, from a prompt. It's amazing that it produced anything functional at all, but there is nothing here we haven't seen before. Without the binary file angle, was it produced wouldn't be interesting enough to make good clickbait. Oh, but it looks like it reverse-engineered a binary at first glance, and superficial appearances are all that matter to the AI hopeful.

        It's all very silly.

        • by gweihir ( 88907 )

          Interesting. So a great big fat lie by misdirection. This has the stink of desperation strongly on it.

    • Even if it doesn't, it still provides a great baseline in a modern day language that can then be maintained. Think of how you could bootstrap old COBOL and FORTRAN apps to a modern language and then have your devs maintain it from there. Yes it may not be 100%, but if it gets you even 80% of the way there in 5 minutes, that can potentially be months and months of saved time.

      • by Calydor ( 739835 )

        It might even be enough that you can actually see what is still missing or wrong.

      • by narcc ( 412956 )

        Just one problem: Nothing was reverse-engineered here. A trivial program was generated based on strings extracted from the binary. Additional prompts were used to add the intended functionality.

        Converting programs from one language to another is non-trivial and certainly well-beyond the capability of a stochastic process, particularly where a high-level of accuracy is essential. We've seen countless proper efforts to do things like COBOL to Java conversion fail because the "80% of the way" that gets prod

    • by klashn ( 1323433 )

      Yeah, who knows if the new program now has malware, and is going to pr0n sites on your behalf, and generated Midjourney images and getting into your 1Password vault and grabbing all your account details since you don't have 2FA enabled to blackmail you from your job at Disney?

    • A 27 year old Visual Basic .exe going to Python is a step back. Why is this considered any sort of accomplishment?

  • don't believe it (Score:2, Interesting)

    by dfghjk ( 711126 )

    "Running a VB4 exe in 2024 can be a real yak-shaving compatibility nightmare, chasing down outdated DLLs and messy workarounds."

    If outdated DLLs need to be "chased down", how would an AI know what those DDLs did? And without knowing, how could it generate Python code to provide that function?

    "...it generated a complete Python equivalent using Pygame. According to the author, the code worked on the first try and the entire process took less than five minutes..."

    Bullshit.

    • by SirSlud ( 67381 ) on Sunday March 02, 2025 @12:23AM (#65204511) Homepage

      Uh, the DLLs are just to run VB. And LLM (or human) doing the same task wouldn't need to know "what those DLLs did". I don't think you have a grasp on the subject matter.

      Also, unless you've been doing and using stuff like this .. it's not bullshit. This is happening, today.

      • Re: (Score:2, Interesting)

        by DrXym ( 126579 )

        If the DLLs are just the VB runtime, then throw the entire executable through a decompiler. I bet this AI did the same.

      • Uh, the DLLs are just to run VB. And LLM (or human) doing the same task wouldn't need to know "what those DLLs did". I don't think you have a grasp on the subject matter.

        Also, unless you've been doing and using stuff like this .. it's not bullshit. This is happening, today.

        Yep. It's so bizarre hearing people say that what we do every day with these tools in impossible.

        The only response I can think of (other than slack-jawed amazement) is "well, maybe impossible for you, for some reason ... "

    • "Chasing down" things is what current LLMs do best, given that they are essentially large databases with a weird compression unit.
    • What do you believe? The actual original chat log is posted in the story. Go see it here along with Claude's analysis of the file and the resulting python output:
      https://claude.ai/share/3eeceb... [claude.ai]

      Keep your dis-beliefs for religion and other actual unproven bullshit rather than projecting it on not liking the reality presented in front of you.

    • by DrXym ( 126579 )

      The answer is that it wouldn't. VB was typically used in conjunction with external DLLs - ActiveX controls and so on. Controls could be written in VB or another language and all you'd have to go on is whatever interface definitions existing in the type lib. Attempting to disassemble some random DLL that was compiled from C++ and infer meaning is going to be a very long and painful process.

      Anyway, I wonder what this AI is doing that people couldn't do with existing tools. I presume there are VB decompilers t

    • by allo ( 1728082 )

      "If outdated DLLs need to be "chased down", how would an AI know what those DDLs did?"

      What is the problem? Let's says the program has a call "download('https://slashdot.org')" and the DLL that implements it cannot be found. I don't see why an AI wouldn't be able to find a python module that provides a download function that does something similar. Especially when the following code makes clear what type of data is expected by the calls processing the result.

  • by jonwil ( 467024 ) on Saturday March 01, 2025 @11:59PM (#65204481)

    There isn't an AI on the planet that could take even a modest sized C++ program and produce any kind of useful/usable source code. Decompilers exist but as someone who has reverse engineered thousands of lines of C++ code in my time, getting anything out of them requires manual work and skills that no ai can re-create.

    • by ZipNada ( 10152669 ) on Sunday March 02, 2025 @12:08AM (#65204487)

      >> There isn't an AI on the planet

      Right, but that's today. Next year there will be some serious contenders.

      • by gweihir ( 88907 )

        Sure. That is the type of "prediction" any good "constant delivery scam" needs. And marks that believe it.

      • I'm no reverse engineer so this is more of a question than an assertion, but I would guess there's a shit-ton of formal computing proofs precluding major aspects of reverse engineering - that information is not preserved, or massive parts of it are NP-complete, or that some things must be incomputable. Turing halting-problem type stuff.

        Granted, smarter heuristics can get you a much higher-quality answer to an unanswerable question, so long as it doesn't have to be rigorously correct all the time. And may

        • by allo ( 1728082 )

          The point with halting problem and similar proofs is, that they are often not relevant.

          The halting problem does not tell you can't proof that a program (won't) halt. It says for every program that can proof that, I can create a program that the halting checker cannot check. But the counter-example program is usually not relevant for reality. If I can check all programs on my computer, why should I care if you can send me a counter example I can only check by creating a new checker (for which new counter exa

        • by HiThere ( 15173 )

          Not really. Any object file is relatively easy to translate into assembler, and from assembler one can construct a C program reasonably easily. Whether you'll be able to understand the C any better than the object code, though, is a different problem. Look at come of the C code generated by Language X to C compilers and you'll see what I mean. (The last one I looked at was Vala to C. In places it was obvious what was going on...in other places....)

          Computer translators tend to produce code that no huma

      • ...Grain of salt. They will say anything to get more money, investment capital.
    • by Draeven ( 166561 )

      requires manual work and skills that no ai can re-create.

      That you know of. That is swiftly changing. Denial won't make it not happen.

    • by MpVpRb ( 1423381 )

      "no ai can re-create" ... today

    • Which part of: we are talking about Visual Basic, not C++, did you not get?

      Converting a C++ program to Python is pretty trivial. However the code won't be readable.

      • by gweihir ( 88907 )

        Code that is not readable is worse than worthless. It cannot be maintained. All you can do with it is throw it away.

        • Who cares if you have an old C++ program that was compiled for 68k
          But you need to have it running on "what ever".

          A program that ran 30 - 40 years perfectly, probably does not need maintenance? Who knows ...

          • by gweihir ( 88907 )

            Sure, in your fantasy scenario, that problem does not crop up. But when we stop hallucinating and look at the real world ...

        • by dvice ( 6309704 )

          You don't need the code to be readable to be able to modify it to your needs. You just start it in a debugger and let it run to the location that needs modification, then apply your modification to that spot and you are done.

          This is not convenient in normal software development where changes are made daily, but it is convenient if you need to make small changes very rarely. For example if your app calls a function that returns calculated tax value and you want to change that function, but keep everything el

    • Since this was done using vb tokens I guess there's a chance all vb programs are vulerable to it. Are there any important and valuable VB programs out there? I suspect the most likely ones would be ancient goverment systems for some small country like maybe the software that handles the transit system in Bucharest or something.

    • by gweihir ( 88907 )

      Indeed. I expect that all the things that require actual insight (like your example) will be the hill LLMs die on.

      Of course, there is a lot of things that could be done by, say, an expert system, if they were not so expensive to create. In a sense, LLMs are bad and unreliable expert systems that can to massively more things. Still bad and unreliable.

    • by DrXym ( 126579 )

      Most C++ software would be built with release settings so it wouldn't even have debug symbols and the code could be optimized for speed / size with the compiler doing various things in each case. A program would have to be very valuable to bother going down the route of trying to decompile the thing.

      I've seen some games decompiled and reverse engineered either to patch out DRM, or fix bugs or find secrets so it's doable but probably requires running the thing in a debugger for a long time to figure out what

    • I'd prefer to get C# code over Python.

    • by allo ( 1728082 )

      AI can work very well with patterns. And Code to assembly is just patterns. Of course you will see AI decompilers in the future.

  • Here is the chat (Score:5, Informative)

    by RitchCraft ( 6454710 ) on Sunday March 02, 2025 @12:14AM (#65204495)

    Here is a link to the chat the generated this: https://claude.ai/share/3eeceb... [claude.ai]

    • by Njovich ( 553857 ) on Sunday March 02, 2025 @03:26AM (#65204669)

      A trivial toy programme, who could have guessed

      • A trivial toy programme, who could have guessed

        And that changes what about this story? You didn't think they re-wrote the Linux kernel in python just now did you? The story was about an LLM analysing a basic VB program and writing it in Python based on the logic it analysed. Nothing more. If you read more into it then that's a "you" problem.

        • by Njovich ( 553857 )

          Did you read the chat? It only got some basic names of elements:

          A form called "Form1"
          A timer control ("cntTimer")
          An exit button ("btnExit")
          Several sound files referenced (rachel.wav, goodbye.wav, booga.wav, mama.wav, dada.wav)
          The application likely plays sounds and may interact with the foreground window

          And like the colors and audio names, and from there it guessed (supposedly correctly) what it did? But at no point is the chat even claiming to be analyzing the logic.

          So in that sense it seems like it just d

        • by skogs ( 628589 )

          What if we could feed some rust into an AI and have it spit out proper C for the kernel??

      • by allo ( 1728082 )

        Something that wasn't possible before. Looks like people wait for AI to create the next operation system kernel before they are impressed what AI can do. Oh, a kernel for quantum computers of course, because if it is a regular kernel they will just claim that it copied from open source kernels.

        • There were/are already old fashioned VB decompilers, and while I didn't see any VB to python translators, such things are generally possible (they exist for a bunch of languages) and they can work fairly well in some circumstances, and such a tool might well have worked here given the simplicity of the program. It also couldn't have done it unless it had examples of people doing it to work from.

  • by mukundajohnson ( 10427278 ) on Sunday March 02, 2025 @12:15AM (#65204499)

    An LLM was able to glean details about the program from the EXE file, since VB has symbols and such. A little magic found the logic, and it created an 100 line python script.

    That's a lot more believable than what this article title implies. I've also had similar success telling an LLM to port some hundred lines of code to a different language. The magic here is that it was able to interpret VB bytecode, unless the logic was fully understandable purely from the symbols.

  • how do they ensure the Intellectual property owner is asking?
  • by Reeses ( 5069 ) on Sunday March 02, 2025 @01:09AM (#65204549)

    Maybe someone should point Claude at the Slashdot codebase and get it to modernize it. Drag this site into 2020 at least.

    I mean, what's the worst that could happen?

    • Perhaps it would port it to Perl 6?

      • It might add - *gasp* - UNICODE support!

        • Slashdot has actually unicode support.
          It is disabled by a command line flag.

          Because no one is expert enough in unicode to make proper input sanitation.
          However I would support it :D

          It is kind of annoying that PC users can use "alt-gr-number" to key in a German Umlaut, because it is an ISO-bla-bla standard character, but Mac users can't, because macOS produces unicode :(

    • by Waffle Iron ( 339739 ) on Sunday March 02, 2025 @01:50AM (#65204601)

      Maybe someone should point Claude at the Slashdot codebase and get it to modernize it. Drag this site into 2020 at least.

      I mean, what's the worst that could happen?

      >> I'm afraid it's not worth the electricity to do that, Dave.

    • Maybe they could get it to stop blocking CGNATs with "you are not allowed to use this resource." Add a delete function? An edit function? UNICODE support? Nope, wishful thinking.

    • One of the other slash sites (Soylent? Technocrat?) did the work about a decade ago and put up their repo. /. seems to have enough staffing for "don't touch anything" mode.

  • by itsme1234 ( 199680 ) on Sunday March 02, 2025 @01:58AM (#65204611)

    If you don't read carefully the actual conversation (link NOT provided in TFA and only later added in the original Reddit post) you'd never know the program was in fact "press any key to play a random sound". This is orders of magnitude different from any "useful" program, never mind commercial behemoths one might want to tackle.

    Also it's Visual Basic which is interpreted (although the code used at runtime is some intermediary format I wouldn't be surprised is the whole source is stored in fact in the .exe) and certainly NOT representative for a "27-Year-Old EXE".

    • Re: (Score:1, Redundant)

      by thegarbz ( 1787294 )

      What does that change? The point of this story is one of analysis and generating python code. It's a story about analysis of logic from an EXE file. How does the size of the program have any impact on the story? No one here (no one with any sense anyway) was under the impression that a complete functioning complex program was translated here. And we have ample other evidence of how LLMs are capable (or rather incapable) of complex logic to know just how far this will scale.

      If you're after a story of LLMs wr

  • I mean that's the main issue here. Transpiling or automatically translating from one language to another is nothing new. Essentially every compiler translates your language into machine code.

    Doing this with a text-generator introduces the new problem of that code being potentially incorrect. Such systems aren't good at strict logical translations.

    • by gweihir ( 88907 )

      Such systems aren't good at strict logical translations.

      Indeed. I have also observed LLMs completely overlooking really important detail until it was explicitly pointed out to them. That is not good at all.

  • The Python code takes just three or four screenfuls on my phone (hold vertically). The game is intended by its author for entertainent of a two year old child. Come on, be serious.
  • Helloworld.exe became hello_world.py

    If a Sunday pilot like me can understand the destination code at first read, it's not "intelligent"

  • Initial coding is about 20% of a software project. Maintenance is, with well written code, about 40%. If this is not well-written code, that will go through the roof and all "gains" will instead be losses. Also, there are some Python behaviors that are fundamentally different than VB and that cannot simply be emulated unless you write a VB runtime in Python.

    Obviously, this "example" here was too trivial to even be taken seriously, but "move functionality from one language to another" is not trivial at all a

    • It's a program from 1997 where the original source code was lost: I suspect maintenance has not been a high priority!

      But also, that kind of misses the point.

      What it's doing is effectively transpiling a simple VB4 program into python. In such cases you want the logic to be largely the same. If it's unmaintainable before, it should be so after, it's meant to be the same code! The neat thing is it worked off p-code rather than the source, and also apparently extracted the bundled files, so it worked as a reaso

  • VB was like scripting glue to lash OCX or ActiveX controls together. If you had DLL dependencies then chances are this AI is not going to help you.

    • I've used a lot of Windows programs written in Visual Basic and delivered as an exe, some of which did real things. It's pretty easy to write (if ugly and with stupid syntax) and it works OK. My only experience with it is in Crystal Reports, where it is being used to fill in the dumb gaps in their product like not being able to calculate a median from an unsorted list. You actually have to write your own sort in Visual Basic, which I did "cold" (my only prior BASIC exposure was AppleSoft) by googling syntax

  • Expecting to see a lot of old software getting migrated to Linux shortly. Should be fun to watch.
  • Tools which could decompile Visual Basic 1/2/3/4/5/6 binaries have existed for ages, and they weren't AI-based.

    Tools that convert programs from one language to another are old news too.

  • "Old business applications and games could be modernized without needing the original source code...

    If AI can bring back WordPerfect 5.1, that would be fantastic. The most perfect word processing program ever made running in the 21st century.

    • by HiThere ( 15173 )

      I disagree. The most perfect word processing program was MSWord 5.1a for the Mac. (The specific version is important, as other versions were less good. And the Mac version was far superior to the PC version.)

  • And the best architecture won't be LLM and maybe not even transformers. But when you can easily create binaries from known source code, you have a lot of training material and can generate training material tailored for the areas that do not work well yet. The rest is a matter of compute ressources to train the model on the inverse problem.

  • I'd need to see the original EXE and re-run the process to understand what exactly was done. The commentary in the generated code leads me to think that the LLM may have generated the Python code from a block of comments in VB program, not the program itself. Another possibility is that this exact VB program was translated by a human at some time in the past and that that translation found its way into the LLM's training data. As the old saw goes, extraordinary claims require extraordinary evidence.
  • "Claude, can you refactor refactor Windows 7, Windows 10, and Windows 11 execution environments, maintaining all visual elements and management tools possible, each to run within their own sandbox within Ubuntu 24.04.1 LTS, making each new environment instance startup use a "fork and execute" model?"

  • by Z80a ( 971949 )

    I bet it became a lot slower as well.
    Visual Basic, specially VB6.0 is just a lot faster in executing code than python.

  • My first VB program was garbage. I did find copies of it a few years ago along with the chinese reverse engineered binary.

    Thing is I've rewritten this program probably a dozen times. Every time I pick up a new language it's usually the second or third thing coded after a hello world and "how does this env work".

    It was also just a lot of complicated math and I spent far more time figuring out the formulas and how the raw calculations work before anything else.

  • Cool, maybe they can start doing that for Python programs themselves, chasing down outdated libraries and messy workarounds, etc

  • Now that would be useful.
  • It would be interested if it could extract original, high-level abstract requirements from such executable. This would be beneficial as such could then me modified and improved on different abstraction level. It's likely the original executable contained a lot of unintended idiosyncrasies that may have been needlessly replicated.

  • IIRC, when the Power Macs first came out, there was a similar tool to convert 680x0 binaries to PowerPC. It wasn't cheap, and I don't know if it produced any sort of source code (except disassembly.) Again IIRC, the only program I know of that tried it was a word processor called WriteNow, which was originally written in 680x0 Assembly.

  • I can see this working if the program isn't too heavy on external controls. There is only so much you can write in VB4 (or any version of VB, for that matter). That "only so much" still includes a lot of useful programs.

    I recently saw the results of decompiling an Android app I've been working on. The decompiled source code was uncannily similar to what I had typed in to Android Studio. What's on the Play Store now is obfuscated: you can see the algorithms (sort of) but no variable or class names.

    ...lau

That does not compute.

Working...