Famous Last Words: You can't decompile a C++ program 479
The Great Jack Schitt writes "I've always heard that you couldn't decompile a program written with C++. This article describes how to do it. It's a bit lengthy and it doesn't seem like the author usually writes in English, but it might just work (haven't tried it, but will when I have time)."
hmm (Score:5, Informative)
Nevertheless there are tools out there that attempt to decompile programs; I think of them more as ways of making assembly more readable.
Note, a lot of them wouldn't work on hand-written assembly, because they rely on knowledge of how certain compilers compile various things- e.g. there was a Delphi decompile available.
graspee
Re:Why (Score:3, Informative)
sure you can go from asm - c++ (Score:5, Informative)
class a
{
public:
void b(int c);
void d(int e);
private:
int g;
int h;
};
int main()
{
a f;
f.b(23);
int x; x=0; x++;
if(x > 3) goto j;
f.d(x); x++
if(x > 3) goto j;
f.d(x); x++;
if(x > 3) goto j;
f.d(x);
j: f.b(42);
return 0;
}
Re:You can't (Score:2, Informative)
Take a sum within a program, for example (a+b)=1000
Spectulation Code (Score:5, Informative)
Here is some code [planet-source-code.com] that supposedly decomplies... not that I've tried it.
Quote from the FAQ [cs.uu.nl]:
I would have posted AC but that have me blocked out for some reason...
Davak
Re:You can't (Score:5, Informative)
Of course a lot of naive people think decompiling would allow you to take an application and start writing patches for it, in that case you are right, it's going to be pretty useless. However it's not entirely useless for all situations. I'm sure the WINE guys might get some use out of it.
Templates (Score:5, Informative)
[insert joke about it being hideously ugly with templates here.]
{I did not read the article itself because it is, of course, slashdotted)
Re:Why (Score:5, Informative)
A library we were basing a major portion of our code on had a bug in it (a Listener class failed to implement EventListener if I remember correctly) which kept our code from working. Removed offending classes from archive, decompiled, fixed, and recompiled.
It's educational...the ol' "how'd they do that?". I've never taken code and used it but I found it instructional to look at how someone made a Swing text area from scratch, e.g.
The challenge...one program I installed had a "enter registration key" and I was curious how that was handled (turned out to be a static string). Then there was this applet that was the the core of a company's business. Free, or pay and get more features. As it turns out the control of the features all resided in the applet, so change a couple of switch and if/then statements and voila, administrative privleges. Didn't use it for evil, much... :) They've since come out with a new version and I've been too busy using my mad java skillz on contract work to take a look at their code.
Looking at security was instructional too, though, for when I was project lead on a commercial Java app I knew what worked and what didn't (we ended up using the Wibu key [wibu.com]).
misleading... (Score:4, Informative)
Re:Why not? (Score:3, Informative)
Slight misunderstanding here . . . (Score:3, Informative)
Of course you can decompile C++ (Score:3, Informative)
All you need to do is read in the bytes of any binary program, interpret the bytes as their machine language equivalents for whatever platform you are using, and then convert your MOV statements to assignment operators, JMP statemets to higher level loop structures, etc..
Of course, you won't retain the names of identifiers, which are referred to only by memory locations in a compiled program; and some control structures might be rearranged due to compiler optimization and the lack of machine language equivalents, but the meat and potatoes of it is all right there.
It's by no means easy to accomplish, especially with higher and higher level programming languages, but impossible? humbug! =)
Re:It's the other way around (Score:2, Informative)
just leave Delphi out of it, Delphi is a true OOP language you should do some research before coming up with a gross generalization like that.
Re:You can't (Score:2, Informative)
Incidentally, you can't even theoretically create a perfect disassembler, at least on the x86 instruction set. The nature of the complex instruction set means that an arbitrary string of bytes can be decoded into a wide variety of programs, especially when you throw in the possibility of self-modifying code, and all that other garbage. It's a little better on RISC with fixed, word-aligned instruction sizes. Some minor problems would still exist, but they wouldn't be much of a hinderance to a practical "good-enough" disassembler.
Not to say that creating a workable disassembler is impossible. However, usually more valuable is a debugger with a disassembled output. In this case, you know the program counter's value, so you can deterministically disassemble the program (up to a point). This is generally all you really need to do reverse engineering. Throwing in a decompiler on top of all this generally doesn't help somebody who is fairly experienced reading a disassembly, although I suppose it could be of help to somebody who's more familiar with C++ than assembly mneumonics.
On the other hand, it's not that hard for somebody to pick up just enough assembly to figure out what's going on, especially if they're technically sophisticated enough to be going to all the trouble of stepping through the program to try and figure out how it works.
So just to reiterate, decompilers are generally not all that valuable.
Silly idea and inconsequential material. (Score:2, Informative)
Of course, with C++, being a strongly typed language that resolves so many things at compile time, decompilation is not possible for any non-trivial example (which all the examples in the link were- indeed they didn't use any C++ features at all). This is even ignoring the effects of compiler optimizations. The C++ language is far more expressive than the output dialects of the compiler making the whole idea of decompiling silly. C, on the other hand, is basically a platform-independent assembly language which is why the one-to-one examples of C and asm output seem to imply one can move back and forth between the two at will. Still this is a mistaken impression.
Now - is compilation from object code to (non-equivilent but functionaly similar) C code useful and interesting? Certainly. And all compiler developers and most hard core debuggers can do this pretty much at will. Its the only way to check the correctness of your compiler and its generated code and, in desperate circumstances, can give you some clue as to what an existing application for which you have no source to, is doing. This is called reverse engineering, btw, NOT decompilation. Unfortunately the material pointed to here provides absolutely no new insights and is quite rudimentary at best. Anyone intimately familiar with their compiler and environment already has more knowledge than this paper provides. Really doesn't justify a slashdot posting but I guess whomever posted it simply isn't a C/C++ developer.
Research Company in Kingston, ON (Score:3, Informative)
A friend of mine work(ed) with a company in Kingston, ON that was spun off from Queens University. Their sole purpose and business model is to take whatever binaries and source a company has available, run it through their cluster of analysis systems, and produce a "clean" update of the system. As per usual, there is about 10-15% of the produced code that needs some hand inspection and tweaking to complete the task.
Their "big" business was the Y2K work, as their software isn't limited to just reverse-engineering, but can also refactor the re-engineered code (e.g. change all "year" values in the system from 2 digit to 4 digit, updating all related I/O formatting functions, overlay structures, etc.)
On the flip side, their stuff involves complex pattern matching and heuristics that put any other system I've heard of to shame. It requires clusters of systems running for days to do the initial code analysis. (OTOH, it probably took years to create the original code.)
I can't provide more specifics on the company because they're having some legal issues with co-investors.
Article is mistitled. (Score:3, Informative)
The title of the first "chapter" is "Why is c++ Decompiling possible?". But immediately he lists "what is totally loss when you compile a program and what stays there".
In the Lost column he puts templates and classes. The remains list has things like function calls and local variables.
Well, guess what? Those things are are "lost" are everything that distinguishes C++ from C. If you don't have classes (meaning no inheritance or virtual functions either) and don't have templates either, then you're really just programming in "a better C", not C++.
So all his approach can hope to "decompile" is C code. Which is something we've seen done in various forms for decades.
Why not indeed. (Score:3, Informative)
There's a huge difference between disassembling and decompiling. With assembly, you generally have a 1 to 1 correspondence between machine language instructions and assembly instructions. That is, one specific instruction you feed to the assembler becomes one specific assembled instruction. Sometimes it's more complicated than this, but only slightly.
Now look at c, where one line of code could be arbitrarily many opcodes, depending on the complexity of the logic within that line (and the length of the line). Now suddenly, instead of looking at one instruction and translating it back to it's equivalent, your decompiler has to look at possibly hundreds of instructions, parse them logically and figure out where each line starts, and ends, and what the logical purpose of each set of instructions is. Then dealing with structures (or in C++, objects) where you have to come up with a definition for how data is laid out based solely on the instructions for dealing with that data.
That's quite a bit more complicated. I sure as hell couldn't do it. I know I could write an assembler or disassembler, I might be able to write a simple compiler, but there's no way in hell I could write a functional decompiler.
Oh there's more (Score:1, Informative)
For a laugh, look at his other tutorials. Surprisingly, his "book" here is among some of the better material. Most have to do with C++, and some assembly, and some even cover the same material in this lengthy and pointless article. I especially like his tutorial on using Macros in C++, a concept so backwards and wrong it shouldn't even have to be mentioned. Sure, macros have uses, but with C++, you have real inline functions and constant variables, so why use them for anything besides #include? Anyway, his other works can be found on pscode.com.
What all this boils down to here, is that nothing new is said here. Not only that, but what is said is presented and worded so poorly that anyone reading it is either going to die of laughter or confusion. If you want to read something on reverse engineering, pick up the dragon book, an assembly book, a good disassembler, and some of the very nice documents on cracking software. Many of these are written by people who will be years ahead of you no matter how hard you work, people who actually know what they're talking about.
- Mik Mifflin
Re:Decompilation = halting problem == boloney (Score:5, Informative)
Stop trying to use logic... actually do it.
Re:Why not? (Score:3, Informative)
With C++ or any high-level language, there zillions of ways a compiler might interpret the code - just as long as the machine code effectively does was the C code says. Even identifying what compiler was used will not help - there are just so many ways to say the same thing in C. for, while, goto, case, it's all syntactic sugar that disappears when you compile.
You can make a decompiler which identifies various code structures and converts them to high-level representations, but it can't EVER know what the original source looked like.