Famous Last Words: You can't decompile a C++ program 479
The Great Jack Schitt writes "I've always heard that you couldn't decompile a program written with C++. This article describes how to do it. It's a bit lengthy and it doesn't seem like the author usually writes in English, but it might just work (haven't tried it, but will when I have time)."
You can't (Score:5, Insightful)
Like turning hamburgers into cows...
Re:You can't (Score:5, Funny)
I'm going to use that line.
Re:You can't (Score:5, Funny)
Clearly you haven't tried Domino's.
Re:You can't (Score:4, Funny)
Here's how:
Flush shit down toilet -> let shit mellow at sewage plant -> strain shit residue out of bottom of sewage vat -> haul to field -> spread on grass -> grass grows -> cow eats grass -> pull cow's udder, direct milk into bucket -> ferment milk to cheese -> shred cheese -> spread on dough -> Pizza!
Re:You can't (Score:2, Informative)
Take a sum within a program, for example (a+b)=1000
Re:You can't (Score:5, Insightful)
I'll RTFA when it comes back to life :).
Re:You can't (Score:2)
As opposed to code by authors from the school of copy & paste, who don't include comments, and are generally confused as to what they are trying to do, I'll take the decompiled code that actually works but needs commenting.
Re:You can't (Score:5, Funny)
Re:You can't (Score:5, Funny)
Compiled C++ code can't be decompiled into anything approximating the readability of the original; compiled perl code can.
Re:You can't (Score:3, Funny)
Yep; compiled Perl already approximates the readability of the original pretty well anyway. :-p
Re:You can't (Score:4, Funny)
Python: Executable Pseudocode
Perl: Executable line-noise
Re:You can't (Score:5, Informative)
Of course a lot of naive people think decompiling would allow you to take an application and start writing patches for it, in that case you are right, it's going to be pretty useless. However it's not entirely useless for all situations. I'm sure the WINE guys might get some use out of it.
Re:You can't (Score:2)
The article is slashdotted, so I couldn't read it, but I would think that C++ would be extreemly difficult to decompile because of the use of inlined functions, and what would you do with templates? Also, I don't see how a class could be recreated from binary. I would be more likely to believe that a C++ binary could be decompiled into (ugly) C code, but not necessarily C++ code.
Re:You can't (Score:3, Insightful)
Re:You can't (Score:5, Insightful)
So what? Doing reasonable interpolations in context is what brains are for. Example: IIRC, when the Morris Worm appeared in 1989, Gene Spafford examined the binary and reverse-engineered the C code, sprinkling it with meaningful comments and good variable and function names. When the original source became available, his turned out to be cleaner program than the original. That is, he not only recreated the original in every way that counts, he overshot and did better than the original
Oop (Score:5, Funny)
Surely he now understands the English infinitive "to be Slashdotted".
Why not? (Score:5, Insightful)
Well, you can decompile every binary programm at least to assembler code, so why shouldnt it possible with C++?
Maybe he ment "you can't decipher the source of a C++ programm"
Re:Why not? (Score:3, Insightful)
No. Assuming we're talking about software disassemblers here, not every program can be reliably disassembled. Disassemblers work by mainly following the execution paths of already disassembled code, so that it knows exactly where a subroutine begins. In many instruction sets, instructions have variable length, and not starting your decoding on the right byte will be a big mistake that cascades on to the next instructions. Now, knowing thi
Re:Why not? (Score:3, Interesting)
Why not indeed. (Score:3, Informative)
There's a huge difference between disassembling and decompiling. With assembly, you generally have a 1 to 1 correspondence between machine language instructions and assembly instructions. That is, one specific instruction you feed to the assembler becomes one specific assembled instruction. Sometimes it's more complicated than this, but only slightly.
Now look at c, where one line of code cou
Re:Why not? (Score:3, Informative)
With C++ or any high-level language, there zillions of ways a compiler might interpret the code - just as long as the machine code effectively does was the
Re:Why not? (Score:2)
Data and code do not usually end up in the same segment.
Re:Why not? (Score:2, Interesting)
So, he's sort of right - you can decompile any binary program to assembler. It's usually called disassembly rather than decompilation, though.
Re:Why not? (Score:3, Informative)
Re:Why not? (Score:2)
No, that's not right. While a compiler could produce assembly as it's final stage (as e.g. lcc [princeton.edu]), gcc, and most other compilers do not. Just because gcc and most other compilers are able to produce assembly code in the same way they produce object code, does not mean that that is what they usu
Re:Why not? (Score:2)
No, it's not the final form, it's the compiled (but unassembled and unlinked) form. These are all discrete stages, as much as some higher-level tools may hide that fact (or even do multiple steps at once).
hmm (Score:5, Informative)
Nevertheless there are tools out there that attempt to decompile programs; I think of them more as ways of making assembly more readable.
Note, a lot of them wouldn't work on hand-written assembly, because they rely on knowledge of how certain compilers compile various things- e.g. there was a Delphi decompile available.
graspee
Re:hmm (Score:3, Insightful)
The barrier to entry is definately raised, but it is always possible to figure out what the compiled code is doing given enough time and effort. In fact, I've even heard of people who patch operating system kernel code with
Re:hmm (Score:5, Interesting)
Create a program that preforms / understands the opcodes for the processor and addressing. And it follows both sides of a branch.
Now "run" the program, that maps out the all opcode and data areas.
Once done. Look at that Assemmebler equivatlent, map out commom subroutines and function calls. Data Storage become very clear. Lastly, commom storage with show external and internal common structures - so naming of fields and visualable.
It is striaght forward, can be time comsuming - and very helpful is understnad hinden or loss information.
Research Company in Kingston, ON (Score:3, Informative)
A friend of mine work(ed) with a company in Kingston, ON that was spun off from Queens University. Their sole purpose and business model is to take whatever binaries and source a company has available, run it through their cluster of analysis systems, and produce a "clean" update of the system. As per usual, there is about 10-15% of the produced code that needs some hand inspection and tweaking to complete the task.
Their "big" business was the Y2K work, as their software isn't limited to just reverse-e
Software industry, fear no more... (Score:2)
On the other hand, did anyone get to mirror it?
sure you can go from asm - c++ (Score:5, Informative)
class a
{
public:
void b(int c);
void d(int e);
private:
int g;
int h;
};
int main()
{
a f;
f.b(23);
int x; x=0; x++;
if(x > 3) goto j;
f.d(x); x++
if(x > 3) goto j;
f.d(x); x++;
if(x > 3) goto j;
f.d(x);
j: f.b(42);
return 0;
}
Re:sure you can go from asm - c++ (Score:4, Funny)
Decompile this! SlashDot Effect! (Score:2, Funny)
Inline functions, templates and decompilation (Score:5, Insightful)
let's get back to basics (Score:5, Funny)
It's the other way around (Score:2)
When you think about it, the higher level the language is, the easier it should be to "decompile". The closer the original source was to asm, the more the individual coder's style will be reflected in the asm - the higher level it is, the more the obvious patterns the compiler uses every time for given constructs will be present. Reverse engineering a program written in asm to human readale source is a nightmare, but if you knew for instance that the source was C++ and it was compiled by gcc 3.2 (easy eno
eh... (Score:2)
You cannot really see a programmer's style as a result. When you decompile, you'll get it returned as whatever the compiler shifted the code around as.
Re:eh... (Score:2)
Exactly, you backed up my point while trying not to
The fact that you cannot see the programmer's style, only the compiler's style, is what makes decompiling source much easier. It's easier to learn the thinking patterns of the compiler by observing its output in various cases than it is to write software that can guess random human patterns.
Re:It's the other way around (Score:2)
have you ever taken a compilers course?
having written a compiler for a toy language (tiger) [google for princeton professor appel's "tiger" language and his collaboration with z. shao, who implemented the heap-activation in SML-NJ....] i can assure y
Re:It's the other way around (Score:2)
Where did I say C was a high level language? I used C++ as a reference because like it or not, it is high enough level to have it's own structure. I didn't use C because I well understand that C is basically portable assembler.
Beyond the current OOP language like C++ and Java, the only things higher level are the toy languages for braindead programmers (think VB, Delphi, FoxPro, etc) - and the various real attempts at 4GL, which never seem to work right for general cases, but can be useful in application
Re:It's the other way around (Score:3, Insightful)
No, no, no. This is both empirically untrue, (Do you see many ML or even C++ decompilers out there?) and theoretically insensible.
The higher level a language is, the more changes there will be between the original source code and the assembly. Thus the more source data that will have been discarded by the original compiler, which is data the decompiler cannot reconstruct.
The reason Java decompilers work
Spectulation Code (Score:5, Informative)
Here is some code [planet-source-code.com] that supposedly decomplies... not that I've tried it.
Quote from the FAQ [cs.uu.nl]:
I would have posted AC but that have me blocked out for some reason...
Davak
thanks for nothing. (Score:4, Interesting)
Shame on you Davak, you should go find honest code. There's nothing wrong with trying to understand how things work. Some people are stuck with legacy equipment or code they can't replace easily and this is their only option for improvement or even fixing it. Those people would be better off if free code were available. Sometimes the only way to make that free code is to understand the original code. There's nothing wrong with reverse engineering software, ever. Republishing someone else's binary is not legal, but it's not immoral. If the code were honest to begin with, the reverse engineer part would not be required. These days, it's cheaper to throw out the dis-honest code and hardware and buy some hardware that's well understood. If you make hardware or software, I hope you understand the implications for your product - I'm not buying it.
Slight misunderstanding here . . . (Score:3, Informative)
If you read carefully, you'll note that the "honest work" sentence is NOT Davak's. It is still indented as part of the blockquote, and therefore is the final section of the passage he was quoting from that C++ FAQ. The last sentence that is actually Davak's is his comment about wishing to post as an anonymous coward, presumably to avoid situations like this o
To all those, who think it's useless... (Score:4, Interesting)
Re:To all those, who think it's useless... (Score:4, Insightful)
A hidden API call - which can be easily found via ASM listings
A nice little algorithm - which can be found in comp sci books
An elegant piece of code - which can *not* be decompiled from ASM
So no, I disagree with you.
Re:To all those, who think it's useless... (Score:2)
Quoteth the original poster
And exposes you to possible trade secret and copyright infringement claims.
Really, if you know somebody else can take input "a", do "something magical" with it, and get output "b", are you really willing to admit that they are smarter then you?
Re:To all those, who think it's useless... (Score:2)
-m
Why stop there? Copy someone's whole life! (Score:2)
Code or pseudocode is available free for many thousands of tough algorithmic problems which have been studied and published in the literature (e.g. Knuth et al) which is to be found in most good university libraries and/or the Internet.
Re:Why stop there? Copy someone's whole life! (Score:2)
Reverse-engineering programs written in C/C++ (Score:2, Interesting)
I've done some reverse-engineering on programs written in C/C++ (Intel x86). After a while you learn how to recognize different things like virtual function calls, while/for-loops, switch and stuff like that. However, it's a totally different thing to decompile to C++. It may be possible to decompile compiled code to C, but don't expect that it will look much like the original source, especially if the code was optimized by the compiler :)
Templates (Score:5, Informative)
[insert joke about it being hideously ugly with templates here.]
{I did not read the article itself because it is, of course, slashdotted)
Re:Templates (Score:2)
Java Decompiler? (Score:3, Interesting)
Something that will literally give me code I can re-compile immediately?
Re:Java Decompiler? (Score:2)
And for everyone that whines about "Oh, the decompiled code doesn't have pretty names...!" Who cares? You can puzzle through. Say some method in your app server throwing a NullPointerException... "well, where in the method could that be happening... decompile, put some debug here, and here... ah, that's weird, it's needs this obscure session variable, how did that go missing?" Now isn't that better than screaming "GODDAMN IT WHY DOES THIS CRAP KEEP BREAKING!!" and distressing your co-worke
Re:Java Decompiler? (Score:3, Interesting)
Re:Java Decompiler? (Score:2)
Reverse engineering has its uses... (Score:5, Insightful)
A good decompiler shows you what was written (Score:5, Interesting)
Losing source code and var names (name spaced globals aka statics and scoped locals) allows the cracker (these are rarely hacking tools, they're mostly cracking tools,) to focus on what the machine actually was told to do instead of smothering it with shades of meaning which interfere with understanding the code.
C++ or Java or Smalltalk, or almost any highly structured language using machine code libraries or virtual machines result in structured blocks of code and heap and stack allocation.
A good decompiler can take the machine code, peel away the name spaces and code calls, extract the patterns in the code and the hacker/cracker can read the patterns instead of wasting time on the code.
Forensic analysis work is extremely useful at telling you what happened when something dies but it is no good at telling you how something worked. For that you need code traces.
Map those code traces onto the structure the decompiler reveals and you understand the program better than the authors/coders.
I couldn't help it (Score:5, Funny)
Cypher: Well you have to. The compilers work for the construct program. But there's way too much information to decode the Matrix. You get used to it. I...I don't even see the code. All I see is an array, function pointer, integer. Hey, you uh... want a drink?
Neo: Sure.
Cypher: You know, I know what you're thinking, because right now I'm thinking the same thing. Actually, I've been thinking it ever since I got here. Why, oh why didn't I sell my VA Linux stock?... Good shit, huh? Cowboy Neal makes it. It's good for two things, degreasing Perl code and killing brain cells.
Anyone want to decompile SCO? (Score:5, Funny)
misleading... (Score:4, Informative)
Decompiling to C++ is like... (Score:4, Insightful)
Thank you, thank you. I'm Mr. Metaphor and I'll be here all week.
Re:Decompiling to C++ is like... (Score:3, Funny)
Calling yourself Mr. Metaphor is like using metaphor instead of analogy, which, in your case, is as incorrect as a cow marking its territory with cow pies and instituting an elaborate cow-tipping territory defense program.
Usefull for compatibility reasons (Score:2, Interesting)
If you make the reverse engineering in europe you could develop compatible software and then export it to US. So it may be great news for us. In fact it is becoming really complicated to develope software for/at US. Patents, legislation, compatibility. It seems that more lawers than programmers are needed to write something more complicated than HelloWorld.
Decompilation = halting problem (Score:4, Insightful)
The long version: In a compiled computer program there is no distinction for either code or data. Every byte in memory can be data, but it can also be executed as valid computer code.
Now, the catch is that during compilation, data and code are mixed in the resulting binary. For instance take the compilation of a 'case' statement. There are several ways of compiling a case:
- you can write it as a list of IF's, which is perfectly fine decompilable
- you can write it as a jump, based on the case expression.
The fun part about the second possibility is that it's far more efficient, but it poses a problem: when decompiling this you have to know where the bounds of the case lie. What's the furthest jump that can be made? It's a jump based on a calculated value, so you should know which values are possible. But for that, you need to run the program, and more specifically, you must run all possible execution paths.
This can be rewritten as the instance of the halting problem: can a computer find out for any program whether or not it will halt? It is proven that a computer program cannot be written to do this task. Neither can a computer program decompile any other computer program.
Re:Decompilation = halting problem (Score:3, Insightful)
Furthermore, there is nothing saying that it has to do a 100
Re:Decompilation = halting problem == boloney (Score:5, Informative)
Stop trying to use logic... actually do it.
Of course you can decompile C++ (Score:3, Informative)
All you need to do is read in the bytes of any binary program, interpret the bytes as their machine language equivalents for whatever platform you are using, and then convert your MOV statements to assignment operators, JMP statemets to higher level loop structures, etc..
Of course, you won't retain the names of identifiers, which are referred to only by memory locations in a compiled program; and some control structures might be rearranged due to compiler optimization and the lack of machine language equivalents, but the meat and potatoes of it is all right there.
It's by no means easy to accomplish, especially with higher and higher level programming languages, but impossible? humbug! =)
Decompiling is possible, but hard (Score:5, Interesting)
C decompilers exist; here's one. [backerstreet.com] There are others. Most aren't very good. It's a hard problem.
Without debugging information, decompilation tends to result in code with arbitrary variable and function names, of course. But you get names when a DLL or .so is entered, so at least you get the program's major interfaces.
Minimal C++ decompilation could be done by adding vtable recognition to a C decompiler.
A more difficult problem is recognition of idioms. Things like "for" statements tend to decompile as lower level constructs. That's OK as a first step. You need some internal representation Initial decompilation might represent all transfers of control with "goto"; higher level recognition then deals with that.
The key to doing a good job is "optimization", finding more concise source code that will generate the object code. The key to this problem is defining an internal representation that can represent any valid machine-language program, and which can be modified as higher level information about the program is discovered. The first step is usually to start at the starting address and build a code tree by following calls, like a good debugger does. Then you start to improve on the code tree, doing things like this:
Decompilation won't always succeed. But you should find all the places where the code is doing something the compiler doesn't understand, and get code back for everything else.
It's a big job, and somebody ought to do it. Among other things, it would be a valuable tool for finding compiler bugs.
Article is mistitled. (Score:3, Informative)
The title of the first "chapter" is "Why is c++ Decompiling possible?". But immediately he lists "what is totally loss when you compile a program and what stays there".
In the Lost column he puts templates and classes. The remains list has things like function calls and local variables.
Well, guess what? Those things are are "lost" are everything that distinguishes C++ from C. If you don't have classes (meaning no inheritance or virtual functions either) and don't have templates either, then you're really just programming in "a better C", not C++.
So all his approach can hope to "decompile" is C code. Which is something we've seen done in various forms for decades.
From the author (Score:5, Interesting)
Anyway i seen alot of people saying decompiling is impossible or at least not practical, well that is not true. Decompiling c++ is very practical because of high level keywords(if,while,for) ,local variables, and parameters. All of these generate certain instruction similer on every platform and just about every proccesser.
I also extending the artical to contain 92 pages in total which will cover OOP, and crt, and a whole bunch of other stuff
The author can't take honest criticism. (Score:4, Interesting)
Not only does the author completely fail to realize that the technique he is describing doesn't remotely qualify as decompilation, and is is nothing but normal reverse engineering, but he figures that the appropriate response to negative criticism is to remove evidence of it rather than attempt to intelligently respond. I noticed that my vote of 1 of 5 was still intact on his voting page, though.
I was originally surprised when I first read the article that someone would think it had merit enough to write about, but having some insight into the mindset of the author that I did not have before (offered by his rapid censorship of my remarks), my surprise has waned completely.
Re:Why (Score:2)
Re:Why (Score:2)
Re:Why (Score:3, Informative)
Re:Why (Score:2)
Re:Why (Score:5, Insightful)
1) Finding backdoors
2) Testing security
3) Fixing bugs
4) Adding features
5) Discovering copyright violations
6) Interfacing to non-supported clients
Pretty much anything and everything you would do if you had the source.
Re:Why (Score:4, Interesting)
I have always felt the greatest problem with closed source was it forced you to trust someone who you were fairly certain had only one skill and that was salesmanship.
It of course raises the interesting question of if you find a copyright violation, in commercial software is your evidence void because the license agreement usually excludes all reverse engineering ?
Re:Why (Score:5, Interesting)
Reverse engineering is taking a black box and figuring out what it contains by giving it test inputs and watching the outputs. There are a few other things considered reverse engineering, but that describes most of it.
Of course, all of this ignores the fact that EULAs have never been tested in court. They could be proven invalid as contracts fairly easily since the exchange of goods occurs before you ever see the EULA and most stores don't accept returns of opened software. Therefore, if you don't agree to the EULA, you still have the right to use what you purchased.
On an interesting side note, various free trade laws specifically protect reverse engineering.
Re:Why (Score:3, Interesting)
It may be a violation of the license agreement which would be a violation of a civil contract The enforcibility and applicability of said agreements have been a point of contention for nearly 30 years now.
Re:Why (Score:5, Insightful)
Re:Why (Score:2, Insightful)
Re:Why (Score:2, Insightful)
What you need is a decent source control/backup system, not a decompiler.
Re:Why (Score:3, Insightful)
Re:Why (Score:5, Informative)
A library we were basing a major portion of our code on had a bug in it (a Listener class failed to implement EventListener if I remember correctly) which kept our code from working. Removed offending classes from archive, decompiled, fixed, and recompiled.
It's educational...the ol' "how'd they do that?". I've never taken code and used it but I found it instructional to look at how someone made a Swing text area from scratch, e.g.
The challenge...one program I installed had a "enter registration key" and I was curious how that was handled (turned out to be a static string). Then there was this applet that was the the core of a company's business. Free, or pay and get more features. As it turns out the control of the features all resided in the applet, so change a couple of switch and if/then statements and voila, administrative privleges. Didn't use it for evil, much... :) They've since come out with a new version and I've been too busy using my mad java skillz on contract work to take a look at their code.
Looking at security was instructional too, though, for when I was project lead on a commercial Java app I knew what worked and what didn't (we ended up using the Wibu key [wibu.com]).
Re:Why (Score:3, Insightful)
Re:Why (Score:5, Insightful)
nice try.
You must be either Bill Gates, Steve Ballmer or someone who works for the BSA.
How am I to tell if your close source program isn't full of my GPL code that you blatently stole and are trying to rob me blind by STEALING my IP? Being a closed source advocate as you seem to be you are for me trying to detect IP theft and the illegal STEALING of my code by PIRATES right?
Ok, I'm going overboard to make my point... I have EVERY right to use tools in a good and legal way. Why not outlaw hammers as anyone can perform a very grisly and horrible murder with one... Or better yet only allow licensed contractors to have hammers! as we know that the unlicensed public is only going to do very ewvil things with tools!
see my point now? A tool is exactly what it looks like.... a tool. it can be used for good and evil. and I dont have any respect for the self righteous like you condemning what I do before I even do it.
people with attitudes like you are what cause all the pain and suffering in this world...... STOP IT!
Re:Why? Java-style reflection (Score:2)
If this technique works (haven't read it, page is slashdotted), maybe it could be used to implement Java-style runtime reflection for C++, which would be extremely cool and useful. Get a pointer to a method, decompile it to find out the expected arguments and return type, and dynamically invoke it.
Re:Why (Score:4, Insightful)
BTW: Nice job getting all those responses with two lines...
Re:Why (Score:2)
"If you steal from one person its plagarism. Stealing from many is research."
Re:Intresting (Score:3, Funny)
Incorrect! Spelling Nazi may have been the answer you're looking for.
Re:Intresting (Score:2, Funny)
Its write hear in my Oxbrige Enlish Dictionairy. What are you on about?
Re:Text of the article (Score:2, Insightful)
please stop breathing and kill any offspring you may have inexplicably fathered for the sake of our gene pool. Thanks.
Respectfully,
-- Human Race
You're right, that is nonsense. (Score:5, Funny)
I can scratch a superscalar CPU out of silicon with a pocket knife. I even have friends who can write major programs in binary code (yes, just 1s and 0s)... even though writing a simple "hello world" program can ammount to 92,752 bits. I fail to realize that this ability does not a good computer scientist make. Things like intelligent design and research make a CS good.
The parent post is fluff. It's stupid, the man is flamboyant and exagerating. He clearly has no real education of computer engineering and does not recognize that any executable code can be reverse-engineered or decompiled. Especially since every langage (save interpreted languages like Java) are compiled to machine code -- specific, unambiguous, structured code. "Decompiling" this is only really a matter of translating it into your langauge of choice.
So, Mr. Proud American, please get off your imaginary high horse. You're not fooling anyone.
Re:This is nonsense (Score:2, Funny)
Re:Smart Compilers (Score:2)
I hope not. If I want it to be recursive, I'll code it that way. If I want it to be iterative, I'll code it that way.
This is not to say that compilers shouldn't optimize things, such as dead code, register optimization, and stuff like that, but I know what I want, not the compiler writer(s).
Re:Smart Compilers (Score:2)
Re:New evolution (Score:2)
No, that's not a new evolution. It has been this way forever...