Morphing Code to Prevent Reverse Engineering? 507
ptolemu writes "Cringely's latest article discusses a new obfuscation technique currently being researched called PSCP (Program State Code Protection). An informative read that concludes with some interesting insight on the software giants that heavily depend on this kind of technology."
Morphing code eh (Score:2, Interesting)
Re:Reverse engineering is not the problem (Score:5, Interesting)
For instance, consider Quake. Quake is a great deal of fun, so long as everybody is playing fair. However, when somebody cracks the game and develops an aimbot (they're real), it's not fun anymore. Even if Quake were open source, some kind of run-time obfuscation would be great just to help prevent cheaters.
I recall reading about an exploit for Age of Empires (or was it Age of Kings...) where in a networked game, you could run a monitor program that would let you see what resources your opponent had. Then, by watching changes in their resource supply, you could guess what units they were building. That was automated for you, of course. "Ah, they keep spending 45 wood and 25 gold, they must be building archers! I should build cavalry."
Anyway, even when we're not talking about greedy corporations protecting their intellectual property rights, there are still good reasons for keeping what's going on in your program hidden from prying eyes.
the dark side (Score:5, Interesting)
ex. I write YourDoom.A and i write it using this new code morphing obfuscator. how exactly are Anti-virus programs 1. suppose to remove this? 2. identify this?
Given the numberous amount of VB/Outlook bugs and considering that
Couldn't this be applied to P2P? (Score:4, Interesting)
And maybe this is already being done -- or maybe this is just pure stupidity on my part for asking the question -- but couldn't this sort of "morph-as-you-go" theory be used to obfuscate -- and essentially hide -- a network path used to get (or put) a piece of data? Kinda like BitTorrent -- but in a much more severe, much more shifty way? You getting the data -- eventually -- and you're both downloading and uploading as you go -- but the paths through which your current bit of data is being retrieved are both unknown until you visit it and obscured once you leave it?
Re:Are folks really using obfuscation for Java? (Score:3, Interesting)
But then again, our software isn't on 90% of all computers or whatever, so I guess we're less worried about exploits.
Re:Are folks really using obfuscation for Java? (Score:5, Interesting)
If you need a tamper-resistant client-side binary, don't use Java. It's that simple. A good engineer understands many different tools and selects the best one for the job.
Anti-virus software and heuristics (Score:5, Interesting)
"Objective review?" (Score:2, Interesting)
It is, of course, very reassuring to know that :
Beeing a bit cynical, I find the article more like a sales plug than a journalistic piece.
Reproducing Production Bugs... (Score:2, Interesting)
Bah (Score:1, Interesting)
Second off, code obfuscators aren't magic. You can always still tell what's happening. It just takes longer and more effort.
Re:Won't work (Score:1, Interesting)
Re:Isn't this just self-modifying code? (Score:5, Interesting)
Re:Reverse Engineering is Good (Score:4, Interesting)
Re:Won't work (Score:5, Interesting)
There was a section of code hidden by about forty layers of byte-by-byte XORing against bytes looked up in a table. At each level, it would intercept the Debug and Single Step interrupts, XOR the next layer, and jump into it. In those floppy-only days, it had to be reverse engineered a layer at a time, each step producing a disk with one less layer. Approximately the 40th disk had the actual copy-test code...which turned out to be pirated code!
This was also before BIOS shadowing in RAM, and the BIOS executed straight from ROM. The test for the laser burn required hooking into it, which of course they couldn't do in ROM. Instead of working out their own shadowing routine they copied some 700 bytes of the IBM Fixed Disk BIOS, inserted their hooks, and then made a weaselly attempt to cover their tracks by interchanging logical-shift with arithmetic-shift instructions wherever it was guaranteed that nothing would go through the carry bit.
And all that meshugass was there only to hide the publisher's own piracy...the copycrack consisted of a two-byte change elsewhere on the disk.
rj
Re:Won't work (Score:5, Interesting)
One forms logical boxes around things. For instance, a good cracker knows to identify the boundary between the JIT and the bytecode, know where the security check call is made, and what threads are monitoring the heap and garbage collector.
When cracking, you initially "freeze" the code, the machine, the stack, and the registers. You're working at such a low level, it begins with a step-by-step of understanding how everything fits together.
For example: Imagine the
Also, you look at the program file itself. THIS is what the article seems to be saying: the bytecode is obsfucated...without context clues you're not going to discern how it works. But you can snap up context many times with a cracking tool. In this article, they seem to imply that each snapshot will be different by scrambling the variable names, or program locations. By seeing how all the names have been crammed, a pattern develops.
Also, I take issue that
Lastly, what makes these tools immune from reverse-engineering themselves? If I know the patterns this DASH-stuff uses, I can begin to reverse them. Unless there's one-way hashing or hardward/networked keys flying around, everything to solve the puzzle is right there, for me and my friends to examine at our leisure. This is done today by virus writers to try to avoid detection by checkers; they know how they work.
If this tool becomes actually as valuable as he claims, then I expect it's own design (stolen or RE'd) to appear in the cracker circles like any other.
But perhaps I'm missing something?
Re:Are folks really using obfuscation for Java? (Score:3, Interesting)
I think on top of that, Java does so much stuff, like garbage collection, that programmers don't need to worry about it. But the Java optimizations are always implemented conservatively. If I did my own garbage collection, I could free the memory as soon as I'm done with it, but under Java GC is done only periodically, and only sweeps items that fulfill certain qualities (so it might not get everything as soon as it should).
But you're right, a big part of it is that under Java many developers might be able to code sloppily and not worry too much because they have the defense that "it's the Java that's making it slow".
How does this protect your .NET assembly? (Score:5, Interesting)
We'll ignore the obvious problem presented by the fact that your
We'll ignore the fact that instructions which modify other code are generally very easy to spot -- because they must refer to regions of a program's address space where code resides -- and it should be easy to find these code morphing instructions and turn them into no-ops.
We'll set both of those tricky issues aside and focus on the crux of the matter. How does this PSCP protect the program *before* it starts running? When the cracker gets his hands on my juicy
So, the school of crackers that likes to use a debugger to deduce program behavior may find themselves having trouble. But in the worst case, all I need to do is run the morphing code in a debugger, record the location of the program counter at the point in the program's execution in which I'm interested, and then consult the corresponding section of the program code that resides in the original
Think about it: if PSCP wanted to change the location of a jump target, for example, it would need to track down every other instruction in the program that jumped to that instruction, and modify the jump to point to the new location of the jump target.
Re:do what i do (Score:3, Interesting)
The scope tag is probably more useful than the data type.
Re:Resource Waste (Score:2, Interesting)
I could see protecting any code that was a trade secret. But protecting you user interface code??? We all know what it looks like already.
Re:do what i do (Score:2, Interesting)
Well, the idea is that, with strongly typed languages and a powerful IDE, you can quickly see the scope and type of any variable with a mouse over.
I was a big fan of Hungarian, but the more I think about it, most variables (so long as their not named Tmp, i, j, etc), are pretty obviously typed. InstanceCount sounds like an integer, DateReceived sounds like a time, FirstName, a string. Now that i've switched to descriptive names, i haven't run into a problem yet of not knowing the scope/type.
If it can be compiled, it can be cracked (Score:5, Interesting)
Examples of Stupid Obfuscator Tricks include:
There are two outs that I know of. One is to only use interpreted code and morph it on the fly (still seems vulnerable to an observant interpreter, but perhaps the amount of necessary observations can be made extravagantly large), the other is to require use of a "trusted" compiler (which, in turn, requires use of a "trusted" OS to prevent substitution of an untrusted compiler, which in turn requires "trusted" hardware to prevent substitution of an untrusted OS).
Certified Applications. (Score:3, Interesting)
The FDA come a knocking and start asking about the checks in place that ensure that the code that you write and document is the code that actually gets performed.
FDA Auditor: So, this code specified in this document. Can you please show me how you ensure that this code is actually performed when you run the program here that you say is the one that this document references.
IT Guy: Sorry, Cant
FDA Auditor: Why isnt it?
IT Guy: because the code that gets run is different every time it is run, and indeed during a single run it changes.
FDA Auditor: So, What your saying is that you cannot guarantee that the applications specified in all these documents is the application code that actually runs.
IT Guy: Yep, thats about it...
Oh, now at this point in the discussion it gets serious.
Who on this list actually thinks that dynamic code obfustication like they propose is actually worth a damn.
What happens when this mutating mess gets it wrong?
Who is to blame?
Come on now, this is stupid, this is the worst form of pandering to corporate paranoia.
This is true snakeoil.
These are all just turing machines.
This argument is so stupid. (Score:5, Interesting)
Like Ultra-Wide-Band networking and enterprise XML integration, this column fits a Cringely mold of writing an entire article about the business plan of one small company most people haven't heard of, and passing it off as an important insight about the IT industry as a whole. It works for the most part because there are a lot of neat-sounding business plans out there. Every start-up company in the world has a story about how their vision, fully realized, would shake up the entire industry. It makes for great column-fodder, but provides poor analyses.
If you read the whole column here twice, you immediately become aware of the fact that Cringely's entire "argument" turns on the idea that security rests on keeping source code secret. Because "interpeted" code "always" discloses code secrets, "interpreted" platforms like .NET will require the intellectual property wrapped up in schemes like PSCP. Therefore, the "inventors" or PSCP hold an important position on the chess-board of the entire IT industry. Microsoft and Sun will launch bidding wars to ensure they control the PSCP IP.
Of course this is just crazy-talk. Just for a moment, leave aside the argument that something like PSCP can really prevent reverse engineering. In the post-PSCP world, all security rests in a distributed repository of millions of lines of source code "locked up" in an organization that spans 45 buildings and untold tens of thousands of people in Redmond. You can't keep source code secret. Closed source is a speed-bump to dedicated attackers, who will break into networks, find corrupt insiders, or even get janitor temp positions in order to get the code.
Nobody working in security seriously believes that the source code for Windows 2000 wasn't floating around the computer underground years before the most recent disclosure. 'Twas ever thus: most of the SunOS and Solaris exploits that powered attackers in the mid-90's were derived from stolen Sun source code. Stolen source trees have always been the most stable currency in the computer underground for exactly that reason. What you do with the compiled product of that code makes no difference if the blueprints are already in enemy hands.
I'm not sure it's even worth confronting Cringely's argument (that PSCP is a strategic technology that is crucial to .NET security) head-on, but I think I can make a decent response simply by evoking video game copy protection. Companies went through all sorts of contortion to devise copy-protection schemes. Kids with the Microsoft Macro Assembler bible thwarted them, because, just like in the DRM/Media battle, when you control the entire player architecture, it is impossible to completely secure the content. Regardless of whether PSCP makes it harder to grep out the cookie cutter exploit from the .NET IR, the payoff in the "battle" between code-obfuscation and exploit generation is much higher than the payoff to defeat copy protection, and nobody has ever won the copy protection battle.
Cringley is right every once in awhile (business plans occasionally do pan out!), like with Eolas and Burst. I normally wouldn't care enough to comment, but this time he's inadvertantly promoting a damaging and popular misconception in his article.
I Should be paid to read articles like that... (Score:4, Interesting)
First of all, anyone that intends to write an article about a "new" software engineering theory or theoretical application needs to make sure they not only understand what they are talking about, but they also choose to collect quotes from people who know what they are talking about.
Here's a hint, if the person says "leverages" in a serious tone of voice they are either a sales-person or only received information from the sales team.
Now, beyond the other comments I could add, such as bad definitions of the framework, and the authors inability to name more than 2 examples of languages available to interact with that framework, there seems to be a large problem with the research content. There isn't any.
I could likely spend 20 to 30 minutes researching background informaiton on the internet and still have a more solid article, simply because I would have real information.
The information provided in this article appears to be the results of carefully skimming sales brochures. There is no real information on the processes involved, reverse engineering, or numbers invilved in terms of performance.
We find out that there are "...billions of paths..." but this is just marketing talk, obvious for it's lack of detail. Reverse engineering is detailed as something used by hackers (in the newer, negative sense) to find holes in code. There is no mention of the other side, ie reverse engineering old software when the original developers are not available and no one felt documentation or up-to-date source code was necessary, among many other valid and legal reasons for reverse engineering. There is a brief comment about the extra resource usage, but it is considered negligable (in comparison to...?) and in fact this process is also mentioned as having no negative impact. tanstaffl.
All in all this sounds like something that will be overhyped, overused, and in the end more of a pain than anything else. Clueless managers everywhere will demand all of the code use this new and impervious format when there are many easier ways to prevent security loss without the so far unknown problems with this new method (not to mention security holes in the obfuscation methods itself).
Now when people try to reverse engineer code to look for security holes they won't find them because the holes were swept under the carpet. I may stand up for MS more often than deride them, but the kindest way I can say this is that this new method of obscurity is a little less than bright. Just as I wouldn't use anything beta from MS, you can bet that I won't be using this technology either. I prefer solid code, testing, and a solid license. By the time they have finished reverse engineering version 1, the next version will be underway, leaving them just as far behind as before.
Snake Oil (Score:3, Interesting)
sorry, I am bitter
On the other hand byte code obfuscators will not stop anyone who wants to disassemble. I remember about 9 or 8 years ago I was disassembling a simple DOS com file (anyone remembers inclogo.com, with a INCLOGO word being printed in a 320x200 graphics mode with some simple 256 byte coloring and shading that changed in a loop? It looked cool, so I wanted to find out how it worked.) Couldn't figure out the machine code for some reason, so loaded it into the Turbo Debugger
Ta da!
done.
How is this going to be any different? Code cannot be obscured to the hardware and a cracker works at the hardware level.
Doesn't this sound to you like the commercials for the 'new and improved 1024 bit encryption' that sometime ago was put out by one Israeli company (there was an article on
This is snake oil.
It's not about learning, it's about plagiarism (Score:5, Interesting)
Solution in search for a problem (Score:3, Interesting)
Re:do what i do (Score:4, Interesting)
Re:Security by obscurity (Score:3, Interesting)
Unless you want to eat your cake and have it, too. Suppose you want to build on some OSS stuff, but don't want to have others be able to do the same with your own code. So, you follow the letter of the GPL and release the source, but make it so damn hard to decipher that it might just as well be binary.
Yep, this is just self-modifying code. (Score:1, Interesting)
This article has several glaring errors:
1. Most of MS problems are due to buffer overflow problems in compiled code, not MSIL.
2. He is really talking about protecting code from theft, not security from hackers. Must be that MS kool aid going to his head.
3. There has to be non-trivial overhead with the PSCP process. If you are scrambling things at run-time, there has to be a run-time monitor doing the scrambling and something doing the unscrambling or translation for the processor. These scramblings have to have deterministic patterns, otherwise a computer wouldn't be able to execute instructions, load memory, etc. These patterns can be hacked. Also, these patterns are based on pseudorandom number generators, which can be hacked. The only thing this whole process will do is make core dumps harder to parse for the Indian technical help desk in Hyderabad.
4. Setting all variable names to a single name (e.g., 'a') in a program is the dumbest description of an obfuscator I've ever read. I've studied them in depth and have never heard of this.
Re:Won't work (Score:2, Interesting)
Ah, ProLock, now there's a blast from the past.
In a previous life, I worked in a group that tested third-party software, but my manager was too cheap to buy a separate copy for each tester. (The company will remain nameless to protect the guilty.) One of my "unoffical" job duties was breaking copy protection of new apps (mostly games) so we could all test them.One day, we were visited by a group of folks from ProLock. Of course, they claimed their protection scheme was completely unbreakable. As if.
Their basic technique was pretty solid. By burning a hole (with frigin' lasers) in the floppy, they introduced physical changes in the media that absolutely could not be duplicated with software.
As Deadstick mentioned, the crack consisted of a two-byte change somewhere on the disk. Basically, at some point the program called a routine to see if the special laser-blasted sector was present. If so, the program continued. The patch was just to pretend the sector was always there.
As an analogy, the ProLock scheme was like putting a bank vault lock on a rice paper door. Instead of wasting time picking the lock, it's much easier to just rip another hole in the door.
Re:do what i do (Score:3, Interesting)
The company I work for has some legacy code still in use... written in Fortran and originally developed on some old unix system. The compiler limited variables, function names, program names, etc. to 5 characters.
5 characters.
Combine that with the lovely syntax of Fortran 77, tons of gotos, pages of variable declarations, sparse comments, NO whitespace whatsoever, and then picture yourself debugging that for a living.
And if that weren't enough, our sourcebase was purchased from a German company. Up until a few years ago, German comments could still be found.
Yes. Laugh it up.
Re:do what i do (Score:1, Interesting)
And the poor results are hardly limited to programmers. I feel certain that the management did not consist of programmers at the restaurant I saw with a sign proclaiming that they were under "new management", complete with the scare quotes. I presume that means the same people, but they received some sort of training, or perhaps lobotomies.
Re:Are folks really using obfuscation for Java? (Score:2, Interesting)
After you get through all this silliness you find out that particular implementations and programming techniques were the root cause of any real or perceived slowdown.
Move forward a few years and Object Pascal is one of the most efficient, although less popular, implementation tools with faster compilation. Java is moving in that direction to.
The bottom line is how well are things working now for you. JITs and AOTs compilers are being used with Java, C# and other OO languages with great results.
I usually like Cringely's stuff but... (Score:2, Interesting)
Microsoft's security bugs are not due to
He claims
The real kicker is when he advocates using an obfuscator to watermark opensource code. Um, yeah. That'll work.
Re:do what i do (Score:1, Interesting)
Re:Are folks really using obfuscation for Java? (Score:4, Interesting)
I enjoy Java and program in Java and will confess that the stuff they include is usually useful (our software would probably be fscked if we didn't have GC or any of these other features, they just degrade performance (and I believe they have to). I would love to hear your response.
When I describe the mark and sweep method, it is the most common, and will likely be the most frequently used. However check here [javaperfor...tuning.com] for an analysis of the other types. If garbage collection were a lightweight, trivial process, then why would Java need to implement 6 different schemes?
Incidentally, we tried testing the various different schemes here and it was a mess trying to get anything out of it.
Yeah, all you have to do is null the object and it'll be collected. Keep in mind, though, that in C++ you just do a delete (or a dealloc) and it's gone, you don't need to scan the whole environment doing reference counts and then doing the corresponding deallocs.
I agree that Java is fine, and it's sturdy, and it's a delight to use, it's just that (all the way up to the great-grandparent) I think that he got it right when Java programmers are (rightfully so) more concerned about all these optimizations (why do you think they're necessary?) than about any sort of run-time security.
Again, just my opinion.
Re:do what i do (Score:3, Interesting)
Many years ago, I inherited control of a R&D team. A year or so earlier, they'd hired a contractor for a couple of months; this contractor had created a ~1000 line Perl module that had acquired "sacred code" status since he'd left.
Any attempt to alter, extend or tweak the sacred code in any way resulted in it failing. It was totally bereft of comments, frequently had multiple commands within a single line of code, lots of "magic" regular expressions, and variable names that didn't make sense to any of us. Most variables had global scope, variables were frequently reused for no apparent reason and it didn't even have blank lines between functions.
I know Perl is frequently a write-only language, but this was something else. I can't believe a human could actually produce code like this on purpose; he must have used some sort of Perl obfuscator.
Everyone in my team was terrified of the sacred code, as they knew working on it was doomed to failure.
One of the most enjoyable times I had on that job was when the guy who created the code came back and asked for more work...
Re:Hungarian? Forget about it. Use Finnish. (Score:1, Interesting)
Finnish and Hungarian are probably related, but not closely. Not like Danish and Norwegian, which are basically as different as British and American; not even as close as French and Spanish, which are recognisably cousins even though they're mutually incomprehensible. More like Japanese and Korean, in fact: some structural similarities that suggest a relationship, but basically no shared vocabulary.
Re:I can see a market for this. (Score:1, Interesting)
Interesting. I think the same. Games and music yes, they rave about copy protection and obfuscation, but in defence, well at least not the projects I worked on in ADA, the issue was never there. Surely that's a more serious application?
Once there was a conversation about whether a missile could run a 'run-once' program that could not be recovered if it was a duff and didn't explode. I think the answer turned out to be yes but if and only if the trajectory is sequence(s) computed on launch - that is, if the program isn't going to encounter any unknowns that dont have a branch pre-run all the way to a sucessful hit. This takes up shitloads of memory but you can load it in RAM and it will erase itself as it goes.... a run once branched script that cleans up behind itself.
The fatal flaw if you attempt to apply this to media copy protection is that in the missile the enemy has no opporunity to read the code while it is in flight, but with deleiverable media you have to present the whole code in its original entirety.
This is not a subtle aspect of code. Its the deep implications of simple logic.
Re:Article shockingly wrong (Score:2, Interesting)
His article seems to be primarily about byte code obfuscators. An interesting topic, but he muddles it completely with factual errors. Hmm... enough ranting, methinks. It's probably not worth it.
Re:Hungarian? Forget about it. Use Finnish. (Score:5, Interesting)
Umm no. You're way off, sorry.
Finnish and Hungarian are related, but not very closely. They're both Finno-Ugric languages, but the relation is roughly as distant as that between, say, German and Greek for instance. And probably less apparent, since German has quite a few Greek loan-words, particularly in scientific fields, but Hungarian and Finnish don't borrow from each other noticeably.
Other Finno-Ugric languages include Mansi, Khanty, Udmurts, and Mordvin, the balto-finnic languages (or dialects, depending on who you ask) which includes Finnish, Estonian, Karelian, Izhora, Veps, Vod, and Liv; and the closely related Saami languages spoken in the far north of Sweden, Norway, Finland, and northwest Russia.
This group is in turn more distantly related to the Samoyedic languages spoken in parts of Siberia.
Basque isn't closely enough related to any of these for linguists to have established any relationship, although many have suspected there was one and put a lot of time and energy into trying to find evidence of one.