Forgot your password?
typodupeerror
Programming Operating Systems Security Software

How Your Compiler Can Compromise Application Security 470

Posted by Soulskill
from the my-compiler-levels-me-out dept.
jfruh writes "Most day-to-day programmers have only a general idea of how compilers transform human-readable code into the machine language that actually powers computers. In an attempt to streamline applications, many compilers actually remove code that it perceives to be undefined or unstable — and, as a research group at MIT has found, in doing so can make applications less secure. The good news is the researchers have developed a model and a static checker for identifying unstable code. Their checker is called STACK, and it currently works for checking C/C++ code. The idea is that it will warn programmers about unstable code in their applications, so they can fix it, rather than have the compiler simply leave it out. They also hope it will encourage compiler writers to rethink how they can optimize code in more secure ways. STACK was run against a number of systems written in C/C++ and it found 160 new bugs in the systems tested, including the Linux kernel (32 bugs found), Mozilla (3), Postgres (9) and Python (5). They also found that, of the 8,575 packages in the Debian Wheezy archive that contained C/C++ code, STACK detected at least one instance of unstable code in 3,471 of them, which, as the researchers write (PDF), 'suggests that unstable code is a widespread problem.'"
This discussion has been archived. No new comments can be posted.

How Your Compiler Can Compromise Application Security

Comments Filter:
  • by Todd Knarr (15451) on Tuesday October 29, 2013 @07:42PM (#45274709) Homepage

    I haven't heard of any compiler that removes code just because it contains undefined behavior. All compilers I know of leave it in, and whether it misbehaves at run-time or not is... well, undefined. It may work just fine, eg. dereferencing a null pointer may just give you a block of zeroed-out read-only memory and what happens next depends on what you try to do with the dereferenced object. It may immediately crash with a memory access exception. Or it may cause all mounted filesystems to wipe and reformat themselves. But the code's still in the executable. I know compilers remove code that they've determined can't be executed, or where they've determined that the end state doesn't depend on the execution of the code, and that can cause program malfunctions (or sometimes cause programs to fail to malfunction, eg. an infinite loop in the code that didn't go into an infinite loop when the program ran because the compiler'd determined the code had no side-effects so it elided the entire loop).

    I'd also note that I don't know any software developers who use the term "unstable code" as a technical term. That's a term used for plain old buggy code that doesn't behave consistently. And compilers are just fine with that kind of code, otherwise I wouldn't spend so much time tracking down and eradicating those bugs.

  • by PolygamousRanchKid (1290638) on Tuesday October 29, 2013 @07:55PM (#45274823)

    It's a pretty cool critter, but I don't know if they actually sell it as a product. It might be something that they only use internally:

    http://www.research.ibm.com/da/beam.html [ibm.com]

    http://www.research.ibm.com/da/publications/beam_data_flow.pdf [ibm.com]

  • by istartedi (132515) on Tuesday October 29, 2013 @09:03PM (#45275347) Journal

    My statement is contradictory. I recommended a course of action for undefined behavior, while maintaining that Clang is wrong for documenting a course of action for undefined behavior.

    My understanding of "undefined behavior" in the C spec is that it means "anything can happen and the programmer shouldn't rely on what the compiler currently does". Of course, in the real world *something* must happen. If a 3rd party documents what that something is, the compiler is still compliant. It's the programmer's fault for relying on it.

    OTOH, if the behavior was "implementation defined" then the compiler authors can define it. If they change their definition from one rev to another without documenting the change, then it's the compiler author's fault for not documenting it.

    In other words:

    undefined -- programmer's fault for relying on it.
    implemenation defined -- compiler's fault for not documenting it.

  • by CODiNE (27417) on Tuesday October 29, 2013 @09:35PM (#45275605) Homepage

    That reminds me of this gem:Overflow in sorting algorithms [blogspot.com]

    That little bug just sat around for a few decades before anyone noticed it.

    Quick summary: (low + high) / 2
    May have an overflow which is undefined behavior. Really every time we add ints it's possible. Just usually our values don't pass the MAX.

  • by Myria (562655) on Tuesday October 29, 2013 @10:25PM (#45275979)

    The first mistake was using signed integers.

    The problem is C's promotion rules. In C, when promoting integers to the next size up, typically to the minimum of "int", the rule is to use signed integers if the source type fits, even if the source type is unsigned. This can cause code that seems to use unsigned integers everywhere break because C says signed integer overflow is undefined. Take the following code, for example, which I saw on a blog recently:

    uint64_t MultiplyWords(uint16_t x, uint16_y)
    {
        uint32_t product = x * y;
        return product;
    }

    MultiplyWords(0xFFFF, 0xFFFF) on GCC for x86-64 was returning 0xFFFFFFFFFFFE0001, and yet this is not a compiler bug. From the promotion rules, uint16_t (unsigned short) gets promoted to int, because unsigned short fits in int completely without loss or overflow. So the multiplication became ((int) 0xFFFF) * ((int) 0xFFFF). That multiplication overflows in a signed sense, an undefined operation. The compiler can do whatever it feels like - including generate code that crashes if it wants.

    GCC in this case assumes that overflow cannot happen, so therefore x * y is positive (when it's really not at runtime). This means the uint32_t cast does nothing, so is omitted by the optimizer. Now, the code generator sees an int cast to uint64_t, which means sign extension. The optimizer this time isn't smart enough to know again that it's positive and therefore can ignore sign extension and use "mov eax, ecx" to clear the high 32 bits, so it emits a "cqo" opcode to do the sign extension.

    So no, avoiding signed integers does not always save you.

  • Re:News flash (Score:5, Interesting)

    by gweihir (88907) on Tuesday October 29, 2013 @10:44PM (#45276109)

    That is not "unstable" or "undefined" code. There is already a word for it: dead code. In addition, any programmer worth his/her salt will make sure to define things like that as "volatile", i.e. tell the compiler that they might be accessed at any time from place the complier does not see. Which is exactly the security problem here. Don't blame compilers for programmer incompetence....

  • by Animats (122034) on Wednesday October 30, 2013 @02:01AM (#45277165) Homepage

    The problem is C's promotion rules. In C, when promoting integers to the next size up, typically to the minimum of "int", the rule is to use signed integers if the source type fits, even if the source type is unsigned.

    I know. C's handling of integer overflow is "undefined". In Pascal, integer overflow was a detected error. DEC VAX computers could be set to raise a hardware exception on integer overflow, and about thirty years ago, I rebuilt the UNIX command line tools with that checking enabled. Most of them broke.

    In the first release of 4.3BSD, TCP would fail to work with non-BSD systems during alternate 4-hour periods. The sequence number arithmetic had been botched due to incorrect casts involving signed and unsigned integers. I found that bug. It wasn't fun.

    C's casual attitude towards integer overflow is why today's machines don't have the hardware to interrupt on it. Ada and Java do overflow checks, but the predominance of C sloppyness influenced hardware design too much.

    I once wrote a paper, "Type Integer Considered Harmful" on this topic. One of my points was that unsigned arithmetic should not "wrap around" by default. If you want modular arithmetic, you should write something like n = (n +1) % 65536;. The compiler can optimize that into machine instructions that exploit word lengths when the hardware allows, and you'll get the same result on all platforms.

  • Re:News flash (Score:4, Interesting)

    by maxwell demon (590494) on Wednesday October 30, 2013 @02:48AM (#45277343) Journal

    While what you say is true, I think it's not what they mean. Instead what they mean is compilers taking advantage of undefined behaviour you didn't notice. The compiler is allowed to assume that undefined behaviour never happens, and optimize accordingly. The important point is that this can even affect code before the undefined behaviour would occur. For example, consider the following code, where undefined() is some code that causes undefined behaviour:

    if (a>4)
    {
      a=4;
      big=true;
      undefined();
    }
    else
    {
      big=false;
    }
    assert(a<=4);

    Now if a>4, the code inevitably runs into undefined behaviour, and therefore it may assume that a is not larger than 4 right from the start. Therefore it is allowed to compile the complete block to simply

    big=false;

    Note that even the assert doesn't help because the compiler "knows" it cannot trigger anyway, and therefore optimizes it out.

    I think it is not hard to imagine how this can lead to security problems.

    Another nice example (which I read on the gcc mailing list quite some time ago; not an exact quote though):

    bool validate_passwd(char const* user)
    {
      int tries = 0;
      char const* given_passwd = ask_password();
      char const* user_passwd = get_password(user);
      while (strcmp(given_password, user_password))
      {
        tries = tries++; /* undefined behaviour! */
        if (tries > 3)
          return false; /* allow only to try three times */
        printf("password not valid. Please try again.\n");
        given_passwd = ask_passwd();
      }
      return true;
    }

    Now if strcmp returns anything but 0, the code inevitably runs into undefined behaviour, therefore the compiler is allowed to assume that never happens, and therefore is allowed to optimize the code to simply

    bool validate_passwd(char const* user)
    {
      char const* given_passwd = ask_password();
      char const* user_passwd = get_password(user);
      return true;
    }

    So there goes your password security.

  • Re:News flash (Score:4, Interesting)

    by Alioth (221270) <no@spam> on Wednesday October 30, 2013 @08:23AM (#45278655) Journal

    In that vein, I tried:

    while(1) {
          bar=bar++;
          if(bar > 3) {
                printf("bar = %d\n", bar);
                break;
          }
    }

    Under gcc (trying -O0 to -O3 and -Os), this code printed "bar = 4". Compiling the same code with clang resulted in an infinite loop.

The one day you'd sell your soul for something, souls are a glut.

Working...