Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Famous Last Words: You can't decompile a C++ program 479

The Great Jack Schitt writes "I've always heard that you couldn't decompile a program written with C++. This article describes how to do it. It's a bit lengthy and it doesn't seem like the author usually writes in English, but it might just work (haven't tried it, but will when I have time)."
This discussion has been archived. No new comments can be posted.

Famous Last Words: You can't decompile a C++ program

Comments Filter:
  • hmm (Score:5, Informative)

    by Graspee_Leemoor ( 302316 ) on Sunday May 25, 2003 @11:21AM (#6034984) Homepage Journal
    A c/c++ decompiler that totally worked would be the Holy Grail of crackers. Unfortunately it is actually impossible to get everything back because lots of info is lost on compilation.

    Nevertheless there are tools out there that attempt to decompile programs; I think of them more as ways of making assembly more readable.

    Note, a lot of them wouldn't work on hand-written assembly, because they rely on knowledge of how certain compilers compile various things- e.g. there was a Delphi decompile available.

    graspee

  • Re:Why (Score:3, Informative)

    by Morologous ( 201459 ) * on Sunday May 25, 2003 @11:21AM (#6034985)
    I can't count the number of times I've been frustrated with the performance or process of an application that I had to interface with, and just wondered: *why* in god's name, or *what* in god's name are they doing in there.
  • by Anonymous Coward on Sunday May 25, 2003 @11:24AM (#6034997)
    but it'll look like this

    class a
    {
    public:
    void b(int c);
    void d(int e);
    private:
    int g;
    int h;
    };

    int main()
    {
    a f;
    f.b(23);

    int x; x=0; x++;
    if(x > 3) goto j;
    f.d(x); x++
    if(x > 3) goto j;
    f.d(x); x++;
    if(x > 3) goto j;
    f.d(x);
    j: f.b(42);

    return 0;
    }

  • Re:You can't (Score:2, Informative)

    by jezzgoodwin ( 675518 ) on Sunday May 25, 2003 @11:30AM (#6035035)
    He's quite right.

    Take a sum within a program, for example (a+b)=1000 ... now there are infinite possible combinations of what a and b can be ... but without the correct variable names, or the commenting that went along with the code (assuming there was some) ... the decompiled output is going to be pretty much useless / extremely difficult to understand
  • Spectulation Code (Score:5, Informative)

    by Davak ( 526912 ) on Sunday May 25, 2003 @11:44AM (#6035104) Homepage
    Considering the entire post is evidently based on speculation...

    Here is some code [planet-source-code.com] that supposedly decomplies... not that I've tried it.

    Quote from the FAQ [cs.uu.nl]:


    [35.4] How can I decompile an executable program back into C++ source code?

    You gotta be kidding, right?

    Here are a few of the many reasons this is not even remotely feasible:
    * What makes you think the program was written in C++ to begin with?
    * Even if you are sure it was originally written (at least partially) in C++,
    which one of the gazillion C++ compilers produced it?
    * Even if you know the compiler, which particular version of the compiler was
    used?
    * Even if you know the compiler's manufacturer and version number, what
    compile-time options were used?
    * Even if you know the compiler's manufacturer and version number and
    compile-time options, what third party libraries were linked-in, and what
    was their version?
    * Even if you know all that stuff, most executables have had their debugging
    information stripped out, so the resulting decompiled code will be totally
    unreadable.
    * Even if you know everything about the compiler, manufacturer, version
    number, compile-time options, third party libraries, and debugging
    information, the cost of writing a decompiler that works with even one
    particular compiler and has even a modest success rate at generating code
    would be significant -- on the par with writing the compiler itself from
    scratch.

    But the biggest question is not how you can decompile someone's code, but why
    do you want to do this? If you're trying to reverse-engineer someone else's
    code, shame on you; go find honest work. If you're trying to recover from
    losing your own source, the best suggestion I have is to make better backups
    next time.

    I would have posted AC but that have me blocked out for some reason...


    Davak

  • Re:You can't (Score:5, Informative)

    by antis0c ( 133550 ) on Sunday May 25, 2003 @11:46AM (#6035116)
    What's to say you need something as readable as the original? I worked at InterAct Accessories/GameShark for a few years before they went under as essentially a 'reverse engineer'. Without getting yet another CND from them in the mail due to a post on Slashdot (I don't even think they could send one now they're out of business?), all I can say is sometimes when hacking a game it benefits an engineer to decompile the application and be able to set breakpoints and watch execution flow while the game is running on for example a PlayStation 2. Sure it's going to be a lot of nearly unreadable C++ mixed with Assembly, but if you can watch the execution flow as you do something, it can be useful.

    Of course a lot of naive people think decompiling would allow you to take an application and start writing patches for it, in that case you are right, it's going to be pretty useless. However it's not entirely useless for all situations. I'm sure the WINE guys might get some use out of it.
  • Templates (Score:5, Informative)

    by ucblockhead ( 63650 ) on Sunday May 25, 2003 @11:58AM (#6035167) Homepage Journal
    He won't be able to regenerate any templates. If a program makes heavy use of templates, the "C++" he "decompiles" to is going to be hideously ugly.

    [insert joke about it being hideously ugly with templates here.]

    {I did not read the article itself because it is, of course, slashdotted)

  • Re:Why (Score:5, Informative)

    by Call Me Black Cloud ( 616282 ) on Sunday May 25, 2003 @12:05PM (#6035195)
    As a Java programmer I find it very useful to decompile class files from time to time. Reasons I've done so:

    A library we were basing a major portion of our code on had a bug in it (a Listener class failed to implement EventListener if I remember correctly) which kept our code from working. Removed offending classes from archive, decompiled, fixed, and recompiled.

    It's educational...the ol' "how'd they do that?". I've never taken code and used it but I found it instructional to look at how someone made a Swing text area from scratch, e.g.

    The challenge...one program I installed had a "enter registration key" and I was curious how that was handled (turned out to be a static string). Then there was this applet that was the the core of a company's business. Free, or pay and get more features. As it turns out the control of the features all resided in the applet, so change a couple of switch and if/then statements and voila, administrative privleges. Didn't use it for evil, much... :) They've since come out with a new version and I've been too busy using my mad java skillz on contract work to take a look at their code.

    Looking at security was instructional too, though, for when I was project lead on a commercial Java app I knew what worked and what didn't (we ended up using the Wibu key [wibu.com]).

  • misleading... (Score:4, Informative)

    by bismarck2 ( 675710 ) on Sunday May 25, 2003 @12:12PM (#6035229)
    Even with complete original source code, understanding a non-trivial C++ application is very difficult. Source derived from an optimized executable is going to be a LOT rougher. No real function names, module names, variable names, or comments. Use of standard libraries (STL, MFC, Boost) is likely highly obscured as well. A tool like this would probably produce source that looks more like a C/machine language hybrid rather than normal C++. The primary use of something like this is if you are looking for a very specific piece of logic such as a password check or an encryption operation or protocol details. When were these famous last words anyway?
  • Re:Why not? (Score:3, Informative)

    by NoMoreNicksLeft ( 516230 ) <john.oyler@ c o m c a st.net> on Sunday May 25, 2003 @12:27PM (#6035315) Journal
    Uh, no. Compilation produces assembly, and then the (sometime integrated) assembler assembles it into machine language (not binary). Forget what switch it is, but gcc even let's you see what asm code it is generating.
  • by Selanit ( 192811 ) on Sunday May 25, 2003 @12:36PM (#6035350)
    If you're trying to reverse-engineer someone else's code, shame on you; go find honest work.

    Shame on you Davak, you should go find honest code ...
    If you read carefully, you'll note that the "honest work" sentence is NOT Davak's. It is still indented as part of the blockquote, and therefore is the final section of the passage he was quoting from that C++ FAQ. The last sentence that is actually Davak's is his comment about wishing to post as an anonymous coward, presumably to avoid situations like this one. Since AC posting wasn't working for him, it might have been a good idea to italicize the quoted passage to set it off clearly from the rest of the post. Oh well, too late now.
  • by TapeLeg ( 671494 ) on Sunday May 25, 2003 @12:54PM (#6035458)
    You can decompile any program. A compiled program is just your high-level program translated into machine language. There is no sort of magical encryption or similar transformation that it undergoes once you compile it.

    All you need to do is read in the bytes of any binary program, interpret the bytes as their machine language equivalents for whatever platform you are using, and then convert your MOV statements to assignment operators, JMP statemets to higher level loop structures, etc..

    Of course, you won't retain the names of identifiers, which are referred to only by memory locations in a compiled program; and some control structures might be rearranged due to compiler optimization and the lack of machine language equivalents, but the meat and potatoes of it is all right there.

    It's by no means easy to accomplish, especially with higher and higher level programming languages, but impossible? humbug! =)
  • by KingRamsis ( 595828 ) <`kingramsis' `at' `gmail.com'> on Sunday May 25, 2003 @01:14PM (#6035594)
    excuse me..!!
    just leave Delphi out of it, Delphi is a true OOP language you should do some research before coming up with a gross generalization like that.
  • Re:You can't (Score:2, Informative)

    by Anonymous Coward on Sunday May 25, 2003 @01:55PM (#6035792)
    The thing is, the point of a decompiler is to make the code readable. If you don't particularly care how readable the code is, then your standard disassembler is usually good enough.

    Incidentally, you can't even theoretically create a perfect disassembler, at least on the x86 instruction set. The nature of the complex instruction set means that an arbitrary string of bytes can be decoded into a wide variety of programs, especially when you throw in the possibility of self-modifying code, and all that other garbage. It's a little better on RISC with fixed, word-aligned instruction sizes. Some minor problems would still exist, but they wouldn't be much of a hinderance to a practical "good-enough" disassembler.

    Not to say that creating a workable disassembler is impossible. However, usually more valuable is a debugger with a disassembled output. In this case, you know the program counter's value, so you can deterministically disassemble the program (up to a point). This is generally all you really need to do reverse engineering. Throwing in a decompiler on top of all this generally doesn't help somebody who is fairly experienced reading a disassembly, although I suppose it could be of help to somebody who's more familiar with C++ than assembly mneumonics.

    On the other hand, it's not that hard for somebody to pick up just enough assembly to figure out what's going on, especially if they're technically sophisticated enough to be going to all the trouble of stepping through the program to try and figure out how it works.

    So just to reiterate, decompilers are generally not all that valuable.
  • by scherrey ( 13000 ) on Sunday May 25, 2003 @01:57PM (#6035809) Homepage
    First off, there is no "decompiling" going on here. That would imply that you will end up with code having a semi-resemblence to the original code - which is certainly not happening. What is going on here is simply just another compilation phase. This time, instead of an object file target compliant with the system ABI, you are getting a C/C++ file target which should theoretically be compilable into a program that will generate the same output for the same runtime input. The scope of effort and implications barely overlap as they are so vastly different.

    Of course, with C++, being a strongly typed language that resolves so many things at compile time, decompilation is not possible for any non-trivial example (which all the examples in the link were- indeed they didn't use any C++ features at all). This is even ignoring the effects of compiler optimizations. The C++ language is far more expressive than the output dialects of the compiler making the whole idea of decompiling silly. C, on the other hand, is basically a platform-independent assembly language which is why the one-to-one examples of C and asm output seem to imply one can move back and forth between the two at will. Still this is a mistaken impression.

    Now - is compilation from object code to (non-equivilent but functionaly similar) C code useful and interesting? Certainly. And all compiler developers and most hard core debuggers can do this pretty much at will. Its the only way to check the correctness of your compiler and its generated code and, in desperate circumstances, can give you some clue as to what an existing application for which you have no source to, is doing. This is called reverse engineering, btw, NOT decompilation. Unfortunately the material pointed to here provides absolutely no new insights and is quite rudimentary at best. Anyone intimately familiar with their compiler and environment already has more knowledge than this paper provides. Really doesn't justify a slashdot posting but I guess whomever posted it simply isn't a C/C++ developer.
  • by msobkow ( 48369 ) on Sunday May 25, 2003 @02:03PM (#6035839) Homepage Journal

    A friend of mine work(ed) with a company in Kingston, ON that was spun off from Queens University. Their sole purpose and business model is to take whatever binaries and source a company has available, run it through their cluster of analysis systems, and produce a "clean" update of the system. As per usual, there is about 10-15% of the produced code that needs some hand inspection and tweaking to complete the task.

    Their "big" business was the Y2K work, as their software isn't limited to just reverse-engineering, but can also refactor the re-engineered code (e.g. change all "year" values in the system from 2 digit to 4 digit, updating all related I/O formatting functions, overlay structures, etc.)

    On the flip side, their stuff involves complex pattern matching and heuristics that put any other system I've heard of to shame. It requires clusters of systems running for days to do the initial code analysis. (OTOH, it probably took years to create the original code.)

    I can't provide more specifics on the company because they're having some legal issues with co-investors.

  • by Minna Kirai ( 624281 ) on Sunday May 25, 2003 @02:23PM (#6035951)
    The article [cxd3.com] (link provided for those who don't read URLs) is wrong, even in the first section.

    The title of the first "chapter" is "Why is c++ Decompiling possible?". But immediately he lists "what is totally loss when you compile a program and what stays there".

    In the Lost column he puts templates and classes. The remains list has things like function calls and local variables.

    Well, guess what? Those things are are "lost" are everything that distinguishes C++ from C. If you don't have classes (meaning no inheritance or virtual functions either) and don't have templates either, then you're really just programming in "a better C", not C++.

    So all his approach can hope to "decompile" is C code. Which is something we've seen done in various forms for decades.
  • Why not indeed. (Score:3, Informative)

    by fishexe ( 168879 ) on Sunday May 25, 2003 @04:26PM (#6036498) Homepage
    Well, you can decompile every binary programm at least to assembler code, so why shouldnt it possible with C++?

    There's a huge difference between disassembling and decompiling. With assembly, you generally have a 1 to 1 correspondence between machine language instructions and assembly instructions. That is, one specific instruction you feed to the assembler becomes one specific assembled instruction. Sometimes it's more complicated than this, but only slightly.

    Now look at c, where one line of code could be arbitrarily many opcodes, depending on the complexity of the logic within that line (and the length of the line). Now suddenly, instead of looking at one instruction and translating it back to it's equivalent, your decompiler has to look at possibly hundreds of instructions, parse them logically and figure out where each line starts, and ends, and what the logical purpose of each set of instructions is. Then dealing with structures (or in C++, objects) where you have to come up with a definition for how data is laid out based solely on the instructions for dealing with that data.

    That's quite a bit more complicated. I sure as hell couldn't do it. I know I could write an assembler or disassembler, I might be able to write a simple compiler, but there's no way in hell I could write a functional decompiler.
  • Oh there's more (Score:1, Informative)

    by Anonymous Coward on Sunday May 25, 2003 @07:52PM (#6037497)
    I know this guy. A sad thing is, lives in the US, and as far as I know, he's a native english speaker, I just can't understand a thing he says. I read this "book" week or two ago when he finished it. I thought this was a very rough draft, but I guess not. I couldn't help but laugh at some things, like it's irrelevance to C++ in general. He should have just used C, since he never even mentions a class.... Well, to be fair, he did mention classes when he describes what is lost in the compilation process, which is untrue, especially if it is a polymorphic class. In fact, I didn't see one thing in this article that would set it apart from one written on the same subject, except using C.

    For a laugh, look at his other tutorials. Surprisingly, his "book" here is among some of the better material. Most have to do with C++, and some assembly, and some even cover the same material in this lengthy and pointless article. I especially like his tutorial on using Macros in C++, a concept so backwards and wrong it shouldn't even have to be mentioned. Sure, macros have uses, but with C++, you have real inline functions and constant variables, so why use them for anything besides #include? Anyway, his other works can be found on pscode.com.

    What all this boils down to here, is that nothing new is said here. Not only that, but what is said is presented and worded so poorly that anyone reading it is either going to die of laughter or confusion. If you want to read something on reverse engineering, pick up the dragon book, an assembly book, a good disassembler, and some of the very nice documents on cracking software. Many of these are written by people who will be years ahead of you no matter how hard you work, people who actually know what they're talking about.

    - Mik Mifflin
  • by jackb_guppy ( 204733 ) on Sunday May 25, 2003 @09:36PM (#6038058)
    Been doing it for twenty years. It is easy to do.

    Stop trying to use logic... actually do it.
  • Re:Why not? (Score:3, Informative)

    by seanadams.com ( 463190 ) * on Monday May 26, 2003 @12:21AM (#6038692) Homepage
    This is such a grossly misinformed statement, I don't even know where to begin. Assembler and machine language ("binary") are semantically identical. You can go back and forth from assembler to machine code all day and still have the same thing. All you lose when going from human/compiler generated (vs disassebled machine code) is labels and comments.

    With C++ or any high-level language, there zillions of ways a compiler might interpret the code - just as long as the machine code effectively does was the C code says. Even identifying what compiler was used will not help - there are just so many ways to say the same thing in C. for, while, goto, case, it's all syntactic sugar that disappears when you compile.

    You can make a decompiler which identifies various code structures and converts them to high-level representations, but it can't EVER know what the original source looked like.

I have hardly ever known a mathematician who was capable of reasoning. -- Plato

Working...