Mystery of Duqu Programming Language Solved 97
wiredmikey writes "Earlier this month, researchers from Kaspersky Lab reached out to the security and programming community in an effort to help solve a mystery related to 'Duqu,' the Trojan often referred to as 'Son of Stuxnet,' which surfaced in October 2010. The mystery rested in a section of code written an unknown programming language and used in the Duqu Framework, a portion of the Payload DLL used by the Trojan to interact with Command & Control (C&C) servers after the malware infected system. Less than two weeks later, Kaspersky Lab experts now say with a high degree of certainty that the Duqu framework was written using a custom object-oriented extension to C, generally called 'OO C' and compiled with Microsoft Visual Studio Compiler 2008 (MSVC 2008) with special options for optimizing code size and inline expansion."
Re:Let's See It (Score:5, Informative)
Re:Source Code? (Score:5, Informative)
Re:Source Code? (Score:5, Informative)
Re:I don't understand the fuss about the language (Score:4, Informative)
They are trying to do the forensics. If you know the tools used, you have a much better idea where to look for the people who did it. It was almost certainly NOT a matter of determining what it was doing, they wanted to figure something out that would help them track it back to the source.
Re:Source Code? (Score:5, Informative)
They did open the lines up for suggestions, and some community members suggested that it looked like OO C. How did they know? They probably had experience using and debugging OO C, if I had to guess. There were also plenty of people who said that it definitely wasn't compiler X or language Y from their own experiences. The article links to this discussion: http://www.securelist.com/en/blog/677/The_mystery_of_Duqu_Framework_solved [securelist.com]
But about discovering the specifics of the truth? It's probably like you alluded to in your comment - fingerprinting the machine code. It would take a while, but you could come up with fingerprints for a great many various compilers and features. You could do that for Common Lisp, too. (In fact, someone DID suggest for them to look at various LISP dialects.) It has taken long enough that such a scenario - having a good library of fingerprints - is believable. Given a scanner with a dictionary of fingerprints, one could reasonably say that you either have hand-assembled machine code made to mimic another language, or that you have code generated by a very specific language and compiler. If nothing in your library of fingerprints matched, assuming you had a good handle on hand-assembling machine code, you could look and see if it smells like such a beast. It would be tremendously laborious to hand-assemble code to make it look like a specific compiler generated it, and why would you do that in the first place? I fail to see the benefit when you could just use that compiler. If you were trying to throw off the analysts with a false positive match, there would still be a ton of mysterious data that still needs examination.
Think about DNA analysis. We can look at our DNA and determine some chunks of it came from virus, and that some of it is "junk" that serves no purpose.
Also think about image analysis like OCR or various captcha-breaking software. You can map images to characters with a program, and detect anomalies and known signatures.
Then there is heuristic antivirus scanning. It knows enough to find some previously unthought-of malicious code, even if it does sometimes generate false positives.
So why not apply those techniques to machine code, and see what you get? If multiple methods give you similar results, you would be onto something, I imagine.
Re:Old-school or new-school? (Score:2, Informative)
OO C is not Objective-C.
Re:Source Code? (Score:4, Informative)
It's only a clue, not an answer. But it's one data point more than they had before. And they need somewhere to start looking for the author.
OO C is very interesting. C++ developers are a dime a dozen (OK, it's 2012, we're four for a quarter.) And you can't swing a dead cat around here without hitting a C coder. But OO C developers are a subset of a subset of people. Nobody who sets out to write a virus for the first time says "I should download a four year old compiler for a language I know nothing about and start writing my virus." They don't read in their copy of "Virus Creation Lab for Dummies" book where it says to torrent a copy of Visual Studio 2008, then download some GNU OO C framework for it. This is a tool that a limited set of experts uses for their day jobs. Possibly it's something a laid off software engineer would still have on his home machine. It might be code generated by a custom library that some gaming house wrote for their own internal stuff, and that by pattern matching with commercial software products they might be able to find the company of origin. They can go back and figure out who they fired in the last three years, and who now is driving the Ferrari. Maybe there's an OO C Google Group this guy participates in. Maybe he published a bogus "please help me with my homework" question on stackoverflow, and they can match some source code to some object code.
Or maybe it doesn't help find the guy today, but tomorrow if they haul a potential perpetrator before a judge, they can provide as corroborating evidence to the jury that the person who wrote this code was very specialized in his knowledge of this esoteric tool, and the defendant worked with this tool every day.
Whatever that clue might be, it could be useful knowledge to someone hunting down the author. Either way, it certainly has value.