Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Famous Last Words: You can't decompile a C++ program 479

The Great Jack Schitt writes "I've always heard that you couldn't decompile a program written with C++. This article describes how to do it. It's a bit lengthy and it doesn't seem like the author usually writes in English, but it might just work (haven't tried it, but will when I have time)."
This discussion has been archived. No new comments can be posted.

Famous Last Words: You can't decompile a C++ program

Comments Filter:
  • by SharpFang ( 651121 ) on Sunday May 25, 2003 @11:46AM (#6035117) Homepage Journal
    Well, it isn't. Sure, if you're so lazy uou want to have source rebuilt from binaries with one click, complete with comments, makefile and documentation, that's of no use. But imagine the program does some very clever trick. Something you ooh about, "How the hell does he do that? It's impossible?". You want to include that trick in your code. You need it. So - you have three options: 1) Try to design it from scratch. Helluva work, you don't know where to start. 2) Look into the binary. If you're ASM guru, you MAY succeed. But ASM from high-level languages is hell to read. 3) Decompile the puppy, look for that piece through what looks like piles of junk, but is way more readable than ASM and find it. Then just rewrite it in pretty fashion, changing variable names and functions to your needs and include in your own software. It's "the best of the worst", last resort at finding a solution to a small problem. Not a way to edit the source and add a single feature to the original program, like remove print protection from Acrobat Reader. The decompiled program most probably won't be possible to compile. You won't make a cow from hamburgers. But with some luck you may find out the cow was a bull and got killed by a truck.
  • by Anonymous Coward on Sunday May 25, 2003 @11:57AM (#6035165)

    I've done some reverse-engineering on programs written in C/C++ (Intel x86). After a while you learn how to recognize different things like virtual function calls, while/for-loops, switch and stuff like that. However, it's a totally different thing to decompile to C++. It may be possible to decompile compiled code to C, but don't expect that it will look much like the original source, especially if the code was optimized by the compiler :)

  • Java Decompiler? (Score:3, Interesting)

    by mindstrm ( 20013 ) on Sunday May 25, 2003 @12:00PM (#6035181)
    Anyone recommend a java decompiler known to work on the most recent versions of java, properly?

    Something that will literally give me code I can re-compile immediately?
  • by crovira ( 10242 ) on Sunday May 25, 2003 @12:06PM (#6035202) Homepage
    not the source's lies.

    Losing source code and var names (name spaced globals aka statics and scoped locals) allows the cracker (these are rarely hacking tools, they're mostly cracking tools,) to focus on what the machine actually was told to do instead of smothering it with shades of meaning which interfere with understanding the code.

    C++ or Java or Smalltalk, or almost any highly structured language using machine code libraries or virtual machines result in structured blocks of code and heap and stack allocation.

    A good decompiler can take the machine code, peel away the name spaces and code calls, extract the patterns in the code and the hacker/cracker can read the patterns instead of wasting time on the code.

    Forensic analysis work is extremely useful at telling you what happened when something dies but it is no good at telling you how something worked. For that you need code traces.

    Map those code traces onto the structure the decompiler reveals and you understand the program better than the authors/coders.
  • Re:Why not? (Score:2, Interesting)

    by BJH ( 11355 ) on Sunday May 25, 2003 @12:11PM (#6035223)
    Actually, not quite true. Assembly code is usually considered to mean the mnemonic code intended for human (well, semi-human) consumption, whereas machine language is the actual binary opcodes and arguments.

    So, he's sort of right - you can decompile any binary program to assembler. It's usually called disassembly rather than decompilation, though.
  • by NoMoreNicksLeft ( 516230 ) <john.oyler@ c o m c a st.net> on Sunday May 25, 2003 @12:20PM (#6035281) Journal
    Updating Total Annihilation to use opengl, increasing the number of weapons (currently 256), and increasing the weapon limit (3 per unit).
  • thanks for nothing. (Score:4, Interesting)

    by twitter ( 104583 ) on Sunday May 25, 2003 @12:22PM (#6035294) Homepage Journal
    If you're trying to reverse-engineer someone else's code, shame on you; go find honest work.

    Shame on you Davak, you should go find honest code. There's nothing wrong with trying to understand how things work. Some people are stuck with legacy equipment or code they can't replace easily and this is their only option for improvement or even fixing it. Those people would be better off if free code were available. Sometimes the only way to make that free code is to understand the original code. There's nothing wrong with reverse engineering software, ever. Republishing someone else's binary is not legal, but it's not immoral. If the code were honest to begin with, the reverse engineer part would not be required. These days, it's cheaper to throw out the dis-honest code and hardware and buy some hardware that's well understood. If you make hardware or software, I hope you understand the implications for your product - I'm not buying it.

  • by wilddur ( 661128 ) on Sunday May 25, 2003 @12:25PM (#6035302)
    In europe it is legal to use reverse engineering for compatibility reasons enabling your software to work with others people software (mainly Microsoft)

    If you make the reverse engineering in europe you could develop compatible software and then export it to US. So it may be great news for us. In fact it is becoming really complicated to develope software for/at US. Patents, legislation, compatibility. It seems that more lawers than programmers are needed to write something more complicated than HelloWorld.exe.

    There is a need for tools that enable the compatibility of the programs or we will end with a monopoly of all kinds of progrmas (And it is illegal to use your O.S. monopoly to obtainthe monopoly of let say...web browsers).
  • Re:Java Decompiler? (Score:3, Interesting)

    by anonymous loser ( 58627 ) on Sunday May 25, 2003 @12:26PM (#6035307)
    JAD is a godsend. I wrote a very complex optimization method that was extremely effective in a couple of circumtstances. A couple of years later, those circumstances turn up again only in a different language. I can't find the source code anywhere, just the class file that had my great method in it. So, JAD comes to the rescue; it gave me a bunch code that used d1,d2,d3,... as my variables, but I already had a basic understanding of what each variable's role was, so it wasn't a problem for me to reverse-engineer my own code and finally port it to another language. I also made several back-ups of the source code this time. :-)
  • Re:hmm (Score:5, Interesting)

    by jackb_guppy ( 204733 ) on Sunday May 25, 2003 @12:55PM (#6035461)
    I wrote reverse compilers on IBM midrange equipment. where there are not stacks and self modifing code is VERY commom place. It is easy to do:

    Create a program that preforms / understands the opcodes for the processor and addressing. And it follows both sides of a branch.

    Now "run" the program, that maps out the all opcode and data areas.

    Once done. Look at that Assemmebler equivatlent, map out commom subroutines and function calls. Data Storage become very clear. Lastly, commom storage with show external and internal common structures - so naming of fields and visualable.

    It is striaght forward, can be time comsuming - and very helpful is understnad hinden or loss information.
  • by Anonymous Coward on Sunday May 25, 2003 @01:02PM (#6035512)
    Having finally gotten through to the server momentarily, it appears that the article in question only applies to MS Visual C++.
  • by Animats ( 122034 ) on Sunday May 25, 2003 @01:14PM (#6035590) Homepage
    Decompilers are rare, but possible. The first good one, decades ago, decompiled IBM 1401 assembler programs into COBOL. There's a commercial business, The Source Recovery Company [source-recovery.com], still doing that for legacy mainframe programs.

    C decompilers exist; here's one. [backerstreet.com] There are others. Most aren't very good. It's a hard problem.

    Without debugging information, decompilation tends to result in code with arbitrary variable and function names, of course. But you get names when a DLL or .so is entered, so at least you get the program's major interfaces. Minimal C++ decompilation could be done by adding vtable recognition to a C decompiler.

    A more difficult problem is recognition of idioms. Things like "for" statements tend to decompile as lower level constructs. That's OK as a first step. You need some internal representation Initial decompilation might represent all transfers of control with "goto"; higher level recognition then deals with that.

    The key to doing a good job is "optimization", finding more concise source code that will generate the object code. The key to this problem is defining an internal representation that can represent any valid machine-language program, and which can be modified as higher level information about the program is discovered. The first step is usually to start at the starting address and build a code tree by following calls, like a good debugger does. Then you start to improve on the code tree, doing things like this:

    • Recognition of function calls. Each function call should be decompiled, and all calls to the same function checked to insure they have the same calling sequence. Then a prototype can be generated and placed in a header file.
    • Recognition of fixed-format structures. Figuring out how big a structure is can be tough, but at least fixed-format ones should be fully recognized. All references to the structure should be checked for type consistency, and a structure definition generated.
    • Recognition of "for", "while", and "switch".
    • Once constructors and destructors have been found, the structure of derived objects can be figured out. Now class definitions can be generated.
    • Once class member functions have been identified, the most restrictive protection ("private", "public", "protected") that will work should be attached. Similarly, "const" can be inserted for all arguments not seen to be modified.

    Decompilation won't always succeed. But you should find all the places where the code is doing something the compiler doesn't understand, and get code back for everything else.

    It's a big job, and somebody ought to do it. Among other things, it would be a valuable tool for finding compiler bugs.

  • Re:Why (Score:4, Interesting)

    by Crashmarik ( 635988 ) on Sunday May 25, 2003 @01:48PM (#6035754)
    That list can also double as 6 things your vendors dont want you to be able to do.

    I have always felt the greatest problem with closed source was it forced you to trust someone who you were fairly certain had only one skill and that was salesmanship.

    It of course raises the interesting question of if you find a copyright violation, in commercial software is your evidence void because the license agreement usually excludes all reverse engineering ?
  • Re:Why (Score:3, Interesting)

    by Crashmarik ( 635988 ) on Sunday May 25, 2003 @01:53PM (#6035783)
    No its not.

    It may be a violation of the license agreement which would be a violation of a civil contract The enforcibility and applicability of said agreements have been a point of contention for nearly 30 years now.
  • Re:Why (Score:5, Interesting)

    by Dylan Zimmerman ( 607218 ) <Bob_Zimmerman&myrealbox,com> on Sunday May 25, 2003 @02:14PM (#6035904)
    Nope. It (probably) wouldn't be admissible because of the part that says no reverse compiling. Reverse engineering is something totally different.

    Reverse engineering is taking a black box and figuring out what it contains by giving it test inputs and watching the outputs. There are a few other things considered reverse engineering, but that describes most of it.

    Of course, all of this ignores the fact that EULAs have never been tested in court. They could be proven invalid as contracts fairly easily since the exchange of goods occurs before you ever see the EULA and most stores don't accept returns of opened software. Therefore, if you don't agree to the EULA, you still have the right to use what you purchased.

    On an interesting side note, various free trade laws specifically protect reverse engineering.
  • Re:You can't (Score:2, Interesting)

    by len_harms ( 455401 ) on Sunday May 25, 2003 @03:30PM (#6036219)
    You probably could get very close.

    With straight C++ classes you probably could get something back resembling them. VC is a very regular compiler. Which is the one he used. Havent looked at what VC dose to templates. But I would be willing to bet it transforms them into type specific classes then into C. Would just need to use the preprocessor and see what it did to it.

    Inline functions though would be imposible to get back. But then again they are inlined. So the code would be there. Just not necessaryly in the original form.

    The VC compiler is just a transform engine. It transforms from C++ to C to PCODE to ASM. Course thats 5 year old info. When I used to care about what the compiler was doing to my code. Templates are probably similar.

    Im sure the code that came back out of this thing would be UGLY. But if you look at the end of most exe's shipped these days most developers do not even bother stripping the exe anymore. You probably could even get back MOST of the classe names and function names maybe even the variables.
  • From the author (Score:5, Interesting)

    by opcodevoid ( 675898 ) on Sunday May 25, 2003 @03:46PM (#6036319)
    I didn't relize my artical was getting any feedback because people are posting it here instead of pscode.

    Anyway i seen alot of people saying decompiling is impossible or at least not practical, well that is not true. Decompiling c++ is very practical because of high level keywords(if,while,for) ,local variables, and parameters. All of these generate certain instruction similer on every platform and just about every proccesser.

    I also extending the artical to contain 92 pages in total which will cover OOP, and crt, and a whole bunch of other stuff

  • Re:Why not? (Score:3, Interesting)

    by arkanes ( 521690 ) <<arkanes> <at> <gmail.com>> on Sunday May 25, 2003 @04:43PM (#6036573) Homepage
    Not directly, but inputting, say, the name of a function or command to call, looking that up in a table of function pointers, and executing the pointed-to function amounts to the same thing.
  • by mark-t ( 151149 ) <markt AT nerdflat DOT com> on Sunday May 25, 2003 @05:41PM (#6036855) Journal
    I posted the following remark about 20 minutes ago on pscode, and when I just checked back there I found that the remark had been surreptitiously removed (I still had a backup of what I had written in my cache):

    Nice try, but no. All this article ultimately describes is how to write high level language code that does the same thing as particular groups of assembly instructions, which is meaningless to a high level language programmer because knowing all the individual steps of a process are nowhere important as understanding what the process actually *IS*. This is something that no automated decompilation process can uncover because the responsibility for that understanding falls on the programmer, not the computer. Since code that only replicates functionality, but does not convey meaning to the programmer is not maintainable, the entire process of decompilation would be wasted. One would probably be better off spending their time figuring out how to do it themselves (with, perhaps, some help from standard reverse engineering, if needed).

    Not only does the author completely fail to realize that the technique he is describing doesn't remotely qualify as decompilation, and is is nothing but normal reverse engineering, but he figures that the appropriate response to negative criticism is to remove evidence of it rather than attempt to intelligently respond. I noticed that my vote of 1 of 5 was still intact on his voting page, though.

    I was originally surprised when I first read the article that someone would think it had merit enough to write about, but having some insight into the mindset of the author that I did not have before (offered by his rapid censorship of my remarks), my surprise has waned completely.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...