Forgot your password?
Programming IT Technology

Reverse Engineering Win32 Trojans on Linux 86

Posted by michael
from the clean-room dept.
slackrootcyc writes "A post (and previous article) give a detailed examination of the reversing process, using a trojan found in the wild. Later on in the story it discusses some techniques for reversing Windows-native code entirely under Linux."
This discussion has been archived. No new comments can be posted.

Reverse Engineering Win32 Trojans on Linux

Comments Filter:
  • by JessLeah (625838) on Saturday November 16, 2002 @04:38PM (#4686922)
    ...the condoms that bluescreen.

    Where do you want to Put It Today?(TM)
  • by Anonymous Coward on Saturday November 16, 2002 @04:42PM (#4686942)
    They're completely unbiased. New IIS hole? Here's the story. New Apache hole? Here's the story. All objective, no "M$ suX0rs!!!1".
    • by PhysicsScholar (617526) on Saturday November 16, 2002 @04:44PM (#4686951) Homepage Journal
      The not-so-great thing about Security Focus is that their Web servers can't handle 10,000 hits in 10 minutes.

      So, here's the text of the article just in case:

      Reverse Engineering Hostile Code
      by Jon Stewart
      last updated October 23, 2002

      Computer criminals are always ready and waiting to compromise a weakness in a system. When they do, they usually leave programs on the system to maintain their control. We refer to these programs as "Trojans" after the story of the ancient Greek Trojan horse. Often these programs are custom compiled and not widely distributed. Because of this, anti-virus software will not often detect their presence. It also means information about what any particular custom Trojan does is also not generally available, so a custom analysis of the code is necessary to determine the extent of the threat and to pinpoint the origin of the attack if possible.

      This article outlines the process of reverse engineering hostile code. By "hostile code", we mean any process running on a system that is not authorized by the system administrator, such as Trojans, viruses, or spyware. This article is not intended to be an in-depth tutorial, but rather a description of the tools and steps involved. Armed with this knowledge, even someone who is not an expert at assembly language programming should be able to look at the internals of a hostile program and determine what it is doing, at least on a surface level.

      Tools Required

      As with most types of engineering, you'll need some tools. We'll cover tools native to both Unix and Windows. While Unix is the ideal platform to perform the initial reverse engineering process, you can still make do on Windows, especially if you install tools such as Cygwin, a Unix environment that runs on Win32 platforms. Most of these commands are also available for Windows when running Cygwin. However, when you get to the decompile/disassemble/debug steps ahead, going the Windows route will cost a lot of money, whereas the Unix solutions are all free. Be sure to weigh the costs of working on Windows versus the benefits before making it your reverse-engineering platform of choice.

      Some useful commands are:

      dd - byte-for-byte copying of raw devices. Useful to perform analysis on a compromised system's hard drive without affecting the integrity of evidence of the intrusion.
      file - tries to identify the type of a file based on content
      strings - outputs the readable strings from an executable program.
      hexedit - allows you to read and edit binary files
      md5sum - creates a unique checksum for a file for comparison
      diff - outputs differences between files
      lsof - shows all open files and sockets by process
      tcpdump - network packet sniffer
      grep - search for strings within a file
      Compressed Executables

      Trojans are often compressed with an executable packer. This not only makes the code more compact, it also prevents much of the internal string data from being viewed by the strings or hexedit commands. The most commonly used executable packer is UPX, which can compress Linux or Windows binaries. There are several other packers available, but they are typically Windows-only. Fortunately, UPX is one of the few that also provide a manual decompression to restore the original file. This prevents us from having to use advanced techniques to decompress the file into its original format.

      In an ordinary executable, running the "strings" command or examining the Trojan with hexedit should show many readable and complete strings in the file. If you only see random binary characters or mostly truncated and scattered pieces of text, the executable has likely been packed. Using grep or hexedit, you should be able to find the string "UPX" somewhere in the file if it was packed by UPX. Otherwise you may be dealing with one of the many other executable packers. Dealing with these other formats is beyond the scope of this article, but you can find resources to help work with these files.


      Occasionally you will get lucky and find that the Trojan was written in an interpreted or semi-interpreted language such as Visual Basic, Java or even compiled Perl. There are tools available to decompile these languages to varying degrees.

      Visual Basic - There is a decompiler floating around the Net for VB version 3. For newer versions, there are no decompilers known, but you can use a tool such as Compuware's SmartCheck to trace calls in the program. While its output is not a source code listing, you can see just about everything the program is doing internally.
      Java - There is the excellent decompiler jad, which decompiles to a complete source code listing which can be recompiled again. Several other java decompilers are also known to exist.
      Perl - Perl programs compiled into Windows executables can be reduced to their bare script using exe2perl.

      If the Trojan was written in a true compiled language, you'll have to bite the bullet and disassemble the code into assembly language. For Unix executables, objdump is the way to go. For Windows executables, you'll want IDA Pro or W32dasm There is a free version of IDA that is just as powerful as IDA Pro but has a console-based interface. These programs will disassemble your code, then match up strings in the data segment to where they are used in the program, as well as show you separation between subroutines. They will attempt to show you Windows API calls by name instead of by offset. This kind of output is known as a deadlisting, and can give you a good idea of what the program is doing internally. The GNU objdump program does not provide such useful features, but there is a perl-based wrapper for objdump called dasm, which will give you much of the same functionality as the Windows disassemblers.


      While a deadlisting can be quite valuable, you will still want to use a debugger to step through the program code, especially if the Trojan is communicating via network sockets. This gives you access to the memory and temporary variables stored in the program, as well as all data it is sending and receiving from socket communications. On Unix, gdb is the debugger of choice. It has a long history on Unix, is well documented, and best of all, is available free of charge. Under Windows, the choices are far more varied, but most tutorials on reverse engineering under Win32 assume you are using SoftICE. It does cost a fair amount of money, but is worth getting if you can afford it.

      Preparing to Debug

      You must take precautions when running hostile code, even under a debugger. You should never debug a Trojan on a production network. Ideally, you should set up a lab network, as shown in figure 1.

      Figure 1: A typical debugging network

      The debug system should have a clean install of whatever OS the Trojan is intended for, with a second box acting as a firewall. A third system on the network allows you to emulate services and capture the network traffic generated by the Trojan. Capturing this traffic can be invaluable in tracing the source of the infection. Ensure that you firewall all outbound connections, allowing only the Trojan's control connection through. If you don't want the master controller to know your lab network is running the Trojan, you can set up services to mimic the resources the Trojan needs, such as an IRC or FTP/TFTP server.

      Stepping Through the Code

      Now that we have constructed a proper quarantined lab environment, we can begin debugging the code. Using the deadlisting, we look for key functions in the program, such as Winsock and file I/O calls. The debugger allows us to set breakpoints in the program based on offset values, so we can interrupt the flow of the program and examine the program memory and CPU registers at that point. The remainder of this article will look at an example of how such a debugging session might look on an x86 Linux platform.

      Running the Debugger

      We want to know how the Trojan communicates with its controller. Often, sniffing the network traffic will be sufficient. However, many newer Trojans are incorporating encryption into their network traffic, making network sniffing a lost cause. However, with some cleverness we can grab the messages from memory before they are encrypted. By setting a breakpoint on the "send" socket library call, we can interrupt the code just prior to the packet being sent. Then, by getting a stack trace, we can see where we are in the program. For example, the Trojan source code might look something like: /* encrypt output to master */
      elen = encrypt(crypted,buf,len); /* write crypted output to socket */
      send(s, crypted, elen, 0);

      Examining the compiled Trojan in gdb might give us the following output [note that the bolded statement represent the author's comments on the output]:

      [test@debugger test]$ gdb ./Trojan
      GNU gdb 5.2.1-2mdk (Mandrake Linux)
      Copyright 2002 Free Software Foundation, Inc.
      GDB is free software, covered by the GNU General Public License,
      and you are welcome to change it and/or distribute copies of
      it under certain conditions.
      Type "show copying" to see the conditions.
      There is absolutely no warranty for GDB. Type "show warranty"
      for details.
      This GDB was configured as "i586-mandrake-linux-gnu"...
      (no debugging symbols found)...
      (gdb) set disassembly-flavor intel [Switch syntax output from AT&T]
      (gdb) b send [Set a breakpoint on the "send" library call]
      Breakpoint 1 at 0x400f5c10
      (gdb) run
      Starting program: /home/test/Trojan

      Breakpoint 1, 0x400f5c10 in send () [We hit a breakpoint]
      (gdb) where [Do a stack trace to see where we are at in the program]
      #0 0x400f5c10 in send () from /lib/i686/
      #1 0x080487fa in socket ()
      #2 0x40040082 in __libc_start_main () from /lib/i686/

      The above output from the "where" command in gdb shows us the offset each subroutine will return to after execution. Since we know that the "send" call was right after our encrypt call, we need only to examine the previous subroutine, which encompasses the return offset 0x080487fa. We are interested in the assembly language code just prior to this offset. Using gdb, we can disassemble the code at this point.

      (gdb) disas 0x080487d2 0x080487fa
      Dump of assembler code from 0x80487d2 to 0x80487fa:
      0x80487d2 : call 0x8048804
      0x80487d7 : add esp,0x10
      0x80487da : mov DWORD PTR [ebp-836],eax
      0x80487e0 : push 0x0
      0x80487e2 : push DWORD PTR [ebp-836]
      0x80487e8 : lea eax,[ebp-824]
      0x80487ee : push eax
      0x80487ef : push DWORD PTR [ebp-828]
      0x80487f5 : call 0x8048534
      End of assembler dump.

      We see that just prior to the call to "send", there was a call to 0x8048804 . In reality, this is our "encrypt" subroutine. When programs are stripped of their symbols, gdb is often confused about where subroutines begin and end, so it continues the name of the last one it recognizes for all following subroutines, often the previous dynamic library call. In this case, it is mislabeled as being part of the "socket" function.

      To examine the contents of the unencrypted packet, we need only know how the "call" instruction works. The arguments to our subroutine were pushed onto the "stack", a place where temporary data and return offsets are stored. We can access the contents of the variables by setting a breakpoint on the call and then using an offset from an internal CPU register known as the stack pointer, ESP. ESP+4 will be a pointer to the first argument, ESP+8 will be a pointer to the second argument, ESP+12 will be a pointer to the third argument, and so forth. Just keep poking at the stack until something useful comes up. In this case, the useful information (the plaintext data) is in the second argument to "encrypt". Let's set a breakpoint at the encrypt call, and examine the stack [Again, the bolded statement represent the author's comments on the output.]

      (gdb) b * 0x80487d2 [Set a breakpoint on the "encrypt" call]
      Breakpoint 2 at 0x80487d2
      (gdb) run
      The program being debugged has been started already.
      Start it from the beginning? (y or n) y

      Starting program: /home/test/Trojan
      (no debugging symbols found)...
      Breakpoint 2, 0x080487d2 in socket ()
      (gdb) x/x $esp+8 [Get the offset of the second argument ESP+8]
      0xbffff5e4: 0x0806fe20
      (gdb) x/fs 0x0806fe20 [Examine the contents of the memory at 0x0806fe20]
      0x806fe20: "root pts/0 Oct 11 14:22\n"

      From this output we can see that the Trojan is reporting back on who is currently logged on to the system. Of course, it could send any kind of data; network packet captures, keystroke logs, etc. Fortunately, we have our network set up so this traffic will be redirected to the sniffer host instead.


      The Trojan above is not real. Had it been an actual Trojan, we might have followed additional courses of action. Often times a Trojan will use established channels such as IRC to reach its master. We can take advantage of this fact, and use it to track down the source of the attack, even gaining control of the entire network of Trojaned hosts if the Trojan writer has been careless. If the Trojan uses FTP to update itself, you might find additional code on the FTP server and possibly clues to the identity of the Trojan writer.

      Although we've only scratched the surface of reverse engineering, you should be able to take the basic information above and put it to work. Read the documentation for your debugger; you'll be surprised at how powerful it can be, and how much it can tell you; even if you're not the best at reading assembly code. If it seems overwhelming at first, don't give up hope. The payoff can be quite gratifying. During one reverse-engineering session the author of this article found the real name of the Trojan author unintentionally embedded in the program's source code (hint: don't write Trojans in VB when logged in to your NT workstation at work). With a quick trip to Google the author's email address and picture was available, posted to a VB discussion site. One "whois" later and his home address and phone number was found. Somewhere in Brazil, a Trojan writer slaps his forehead and says (in Portuguese), Doh!
      • Using grep or hexedit, you should be able to find the string "UPX" somewhere in the file if it was packed by UPX.
        This is unreliable. The "UPX" signature can be changed to anything; perhaps "DLL", and UPX will refuse to unpack it. Furthermore, it may be difficult to identify that the executable was packed with UPX, therefore hindering decompression once more. Security through obscurity does not work!
  • by SuperDuG (134989) <(be) (at) (> on Saturday November 16, 2002 @04:44PM (#4686949) Homepage Journal
    hehehehe wonder if Symantec and Network Associates will sue for having their code reverse engineered ...

    wait a minute anti-virus software makers don't make virii, what was I thinking

    • by Anonymous Coward
      They actually do make some of the viruses. (Which is plural of virus.) But they don't make the trojans. The trojans are made so that people can gain remote access to your computer for a few reasons. Either they want your hdd space or they want personal information about you. Even something as benign as VNC or Radmin can be turned to the "dark side."
  • Uh Oh... (Score:3, Funny)

    by nothing safe (626252) on Saturday November 16, 2002 @04:44PM (#4686952)
    *GASP* Does this mean that the cat is out of the bag with that top secret trojan known as 'Sub7'?
  • On Mac OS-X (Score:3, Interesting)

    by Anonymous Coward on Saturday November 16, 2002 @04:45PM (#4686958)
    I know a Windows underground group which is converting M$ Windows trojans to Mac OS-X. They just think it's cool - that's their motivation. I don't see what's so cool in it..
  • by Slashdotess (605550) <gchurch@hotma i l .com> on Saturday November 16, 2002 @04:46PM (#4686960)
    This is why we should be coding everything in Open Source. The fact being is, in this highly dynamic internet society today Trojans can hide their code to prevent security professionals from doing their job. When we finally open source these trojans, our software will become more secure because programmers from around the world can work on making the trojans and the programs the effect faster, better, and more secure.

    Currently, trojans are badly written because of their inherent proprietary nature. Using something like sourceforge a multitude of coders can be simultaneously working on different parts of a trojan while the open source community can review, debug and test the code for infectioness effectiveness.

    Only when we make Trojans open source will we realize that our computer controlled Oil tankers accross the world will be safe from Da Vinci.
  • by SexyKellyOsbourne (606860) on Saturday November 16, 2002 @04:46PM (#4686962) Journal
    This is some pretty neat stuff: the author details how to find a needle in a haystack for a virus establishing a TCP connection from nothing more than raw dissassembly, and then how to use breakpoints in the WINE program to get gdb to work with it.

    Though you can do that with a simple netstat, it opens up ways to find everything else about the trojan, too, without the risk of raping your native environment Windows system.

    Too bad most nu-geek slashdotters would rather hear about someone putting a neon rope light inside their computer case.
  • Magic Patch (Score:5, Informative)

    by taviso (566920) on Saturday November 16, 2002 @04:47PM (#4686966) Homepage
    I made this little patch a few days ago to /etc/magic, it can detect when an executable has been packed with upx (works against latest 1.90 release)

    --- magic.orig 2002-11-16 20:43:02.000000000 +0000
    +++ magic 2002-11-13 12:54:09.000000000 +0000
    @@ -1793,6 +1793,7 @@
    >>16 leshort 1 relocatable,
    >>16 leshort 2 executable,
    >>16 leshort 3 shared object,
    +>>0x79 string UPX UPX compressed,
    # Core handling from Peter Tobias <>
    # corrections by Christian 'Dr. Disk' Hechelmann <>
    >>16 leshort 4 core file

    example output:
    $ file ./counter
    ./counter: ELF 32-bit LSB executable, UPX compressed, Intel 80386, version 1 (Linux), statically linked, stripped
  • by Dakisha (526733) on Saturday November 16, 2002 @04:48PM (#4686970)
    And in further news, trojan writers worldwide file a DMCA suit against linux users for circumventing there security and reverse compiling there intelectual property ;)
  • by jeroenb (125404) on Saturday November 16, 2002 @04:52PM (#4686994) Homepage
    I've used WINE quite extensively and I would say if you want to reverse engineer a piece of Win32 code WINE might be the best way to do it on Linux. On the other hand, so much is either not implemented or only implemented halfway, I wouldn't really consider my WINE-based findings to be an objective assessment of what a piece of code would do once actually run on a system based on an original version of Windows.

    I don't really see why you'd go through all the trouble of using Linux to reverse a Win32-trojan. The only argument the author of the two linked articles gives is that all related development tools on Linux/Unix are free. However, if you just want to poke around some code without producing optimized binaries, you can get cheap versions of MS Developer Studio (so-called "Learning Editions") as well.

    I mean, this kind of stuff is complicated enough without the possible hassle of having your environment messed up because of some incomplete emulator.
  • by Anonymous Coward
    With any luck, the anti-virus companies will soon start to figure out how to write linux viri...

    They've done a darn good job on win32! Just imagine the amount of work they've put in... Especially when all you need is the following options:

    o Remove .Exe attachments
    o Remove .Com attachments
    o Remove embedded (inline) e-mail files.

    But wait, that'd be too easy!
  • Doing assembly dumps on object code isn't terribly exciting. Doing this on trojans is perhaps even less so, even on Linux.

    But, referring to doing this on native Windows code is not a good idea at all. Remember the EULA, simply having the Windows code on your disk constitutes acceptance of the EULA and reverse engineering by assembly dumps is explicitly defined as a violation of the EULA. In other words you are setting yourself in a position for major legal problems.

    The only legitimate way to reverse engineer software is the method used by the Samba team. You must look at the input and look at the output and then determine your OWN method of achieving the same result.

    This is the only legal way to do it. If you even glance at an assembly dump of the actual software, you are no longer virgin. Thus ANYTHING that you produce afterwards the even vaguely resembles the operation of the original software will place you in a losing position, legally.

    Avoid assembly dumps of MS code!
    • And how many trojans come with EULAS? I don't think your argument applies here...
    • Reverse engineering is protected indirectly by laws in other countries that override the EULAs, since those clauses are not valid under the state laws.

      Russian crackers would happily tell you all about this, just like they happily tell the owners of the software they've cracked when they're slapped with Cease and Desists.

    • by g4dget (579145) on Saturday November 16, 2002 @06:13PM (#4687400)
      But, referring to doing this on native Windows code is not a good idea at all. Remember the EULA, simply having the Windows code on your disk constitutes acceptance of the EULA and reverse engineering by assembly dumps is explicitly defined as a violation of the EULA. In other words you are setting yourself in a position for major legal problems.

      Don't believe everything you read. Just because Bill Gates writes into the EULA that you'll work as his towel boy if you open the box doesn't mean you are actually legally obligated to.

      The only legitimate way to reverse engineer software is the method used by the Samba team. You must look at the input and look at the output and then determine your OWN method of achieving the same result.

      Sorry, but you don't know what you are talking about. That is not "the only legitimate way".

      Thus ANYTHING that you produce afterwards the even vaguely resembles the operation of the original software will place you in a losing position, legally

      Oh, please, stop the hysteria. These things need to judged on a case-by-case basis. I frankly doubt that reverse engineering a trojan/virus will get you into hot water with Microsoft's EULA.

      • Just because Bill Gates writes into the EULA that you'll work as his towel boy if you open the box doesn't mean you are actually legally obligated to.

        "Piss Boy, wait for the shake... [splash]... [ploink] Your tip is in the bucket."
  • by Anonymous Coward on Saturday November 16, 2002 @05:07PM (#4687073)
    Those wishing to learn more about Reverse Engineering software may find the following pages useful:

    Fravia's pages [] - A huge, sprawling resource of RE information. Chances are, any info you need is in here somewhere. It's just a matter of finding it...

    The Art of Assembly [] and other essential ASM programming links. If you want to learn RE, sooner or later you're going to have to learn assembly. Get to it.

    Mammon's Tales to his Grandson [] and other useful RE classics by a G.O.M. of the genre. Oh, and an older mirror [], possibly with extra/different stuff on it.

    Google's directory listing for Disassemblers [], which you'll be wanting at least one of...
    ...and the listing for Testing tools [], which may come in handy.

    Finally, Compuware's SoftIce page [] - SoftIce being the single most popular RE tool for Win32 software... Not that you're likely to be paying for it, you warez monkey, you.

    Have fun, kids, and release Open Source.

    (Posting Anon because I don't need the Karma or the implication of knowledge =)...
    • SoftIce is (or at least was, and I presume still is) truly amazing. The version I used, awhile back, loaded *before* windows, allowing it to breakpoint on anything, about as low-level as you can get.

      Too bad VMWare doesn't support debugging in it's PC emulation, it would even be better than the Wine approach (a real copy of windows running). Still there are some good tools out there to trace programs. Very cool stuff.
      • A coworker was able to succesfully debug in vmware by looping a serial cable out one port and back in the other, giving one port to vmware and using softice's remote serial debugging to debug from the vmware host computer.
  • I've seen this phrase a couple of times on /., but I'm not sure entirely what it means. Can someone provide a link or a concise explanation if no link?

    • RE is the process of looking at how software or hardware works, and trying to replicate it, without looking at the source code. ie, trying to build a car by looking at a car, rather than blueprints.
    • "Engineering" refers to starting with a goal (desired functionality) and arranging materials in a way (determined by a possibly involved design process) that reaches that goal (by delivering a finished product that works).

      "Reverse Engineering" is the exact opposite: one has a finished product that does something, or at least would if it were in proper working order. (Did you break it?) There might be some documentation. One might have some idea of the goal, perhaps only a vague one. (E.g., an automaker might reverse engineer a competitor's automobile, which has an obvious goal. For a Windows virus, you have a vague idea that the program is supposed to cause damage and/or replicate itself.) The missing part is insight into the design process that happened. Figuring out that design process (by whatever means) is the goal of "reverse engineering."

      Why do you want to know about the design? You might discover the secret behind some unique functionality. You might expose some flaw or weakness. You might be able to reconstruct enough of the internal protocols to be able to develop compatible products. (E.g. understand enough about IBM's PC BIOS to document its behavior well enough for a programming team to construct a compatible BIOS without simply duplicating the ROM contents, to break into the market for making PC clones in the 1980s.) You might just enjoy tearing things apart.

      Hope this helps!

A motion to adjourn is always in order.