Forgot your password?
typodupeerror
Programming Operating Systems Security Software

How Your Compiler Can Compromise Application Security 470

Posted by Soulskill
from the my-compiler-levels-me-out dept.
jfruh writes "Most day-to-day programmers have only a general idea of how compilers transform human-readable code into the machine language that actually powers computers. In an attempt to streamline applications, many compilers actually remove code that it perceives to be undefined or unstable — and, as a research group at MIT has found, in doing so can make applications less secure. The good news is the researchers have developed a model and a static checker for identifying unstable code. Their checker is called STACK, and it currently works for checking C/C++ code. The idea is that it will warn programmers about unstable code in their applications, so they can fix it, rather than have the compiler simply leave it out. They also hope it will encourage compiler writers to rethink how they can optimize code in more secure ways. STACK was run against a number of systems written in C/C++ and it found 160 new bugs in the systems tested, including the Linux kernel (32 bugs found), Mozilla (3), Postgres (9) and Python (5). They also found that, of the 8,575 packages in the Debian Wheezy archive that contained C/C++ code, STACK detected at least one instance of unstable code in 3,471 of them, which, as the researchers write (PDF), 'suggests that unstable code is a widespread problem.'"
This discussion has been archived. No new comments can be posted.

How Your Compiler Can Compromise Application Security

Comments Filter:
  • by istartedi (132515) on Tuesday October 29, 2013 @07:25PM (#45274579) Journal

    If my C code contains *foo=2, the compiler can't just leave that out. If my code contains if (foo) { *foo=2 } else { return EDUFUS; } it can verify that my code is checking for NULL pointers. That's nice; but the questions remain:

    What is "unstable code" and how can a compiler leave it out? If the compiler can leave it out, it's unreachable code and/or code that is devoid of semantics. No sane compiler can alter the semantics of your code, at least no compiler I would want to use. I'd rather set -Wall and get a warning.

    • I'd rather set -Wall and get a warning.

      There are some undefined behaviors that can't be detected so easily at compile time, at least not without a big pile of extensions to the C language. For example, if a pointer is passed to a function, is the function allowed to dereference it without first checking it for NULL? The Rust language doesn't allow assignment of NULL to a pointer variable unless it's declared as an "option type" (Rust's term for a value that can be a pointer or None).

      • by Zero__Kelvin (151819) on Tuesday October 29, 2013 @07:46PM (#45274757) Homepage

        "For example, if a pointer is passed to a function, is the function allowed to dereference it without first checking it for NULL?"

        Of course it is, and it is supposed to be able to do so. If you were an embedded systems programmer you would know that, and also know why. Next you'll be complaining that languages allow infinite loops (again, a very useful thing to be able to do). C doesn't protect the programmer from himself, and that's by design. Compilers have switches for a reason. If they don't know how it is being built or what the purpose of the code is then they can't possibly determine with another program if the code is "unstable".

        • by EvanED (569694)

          Of course it is, and it is supposed to be able to do so.

          Actually no, you're not, or you're programming in Some-C-Like-Language and not C. In C, dereferencing a NULL pointer is always undefined behavior, and compilers are allowed (though presumably very unlikely to on embededd platforms) to make transformations based on that assumption, such as the following:

          void f(int * p) {
          int x = *p;
          if (p == NULL) {
          g();
          }
          }

          C compilers are allowed to optimize away the null check and s

    • by Anonymous Coward on Tuesday October 29, 2013 @07:38PM (#45274679)

      An example of "unstable code":

      char *a = malloc(sizeof(char));
      *a = 5;
      char *b = realloc(a, sizeof(char));
      *b = 2;
      if (a == b && *a != *b)
      {
              launchMissiles();
      }

      A cursory glance at this code suggests missiles will not be launched. With gcc, that's probably true at the moment. With clang, as I understand it, this is not true -missiles will be launched. The reason for this is that the spec says that the first argument of realloc becomes invalid after the call, therefore any use of that pointer has undefined behaviour. Clang takes advantage of this, and defines the behaviour of this to be that *a will not change after that point. Therefore it optimises if (a == b && *a != *b) into if (a == b && 5 != *b). This clearly then passes, and missiles get launched.

      The truth here is that your compiler is not compromising application security – the code that relies on undefined behaviours is.

      • by dgatwood (11270) on Tuesday October 29, 2013 @07:56PM (#45274827) Journal

        Another, more common example of code optimizations causing security problems is this pattern:

        int a = [some value obtained externally];
        int b = a + 2;
        if (b < a) {
        // integer overflow occurred ...
        }

        The C spec says that signed integer overflow is undefined. If a compiler does no optimization, this works. However, it is technically legal for the compiler to rightfully conclude that two more than any number is always larger than that number, and optimize out the entire "if" statement and everything inside it.

        For proper safety, you must write this as:

        int a = [some value obtained externally];
        if (INT_MAX - a < 2) {
        // integer overflow will occur ...
        }
        int b = a + 2;

        • by CODiNE (27417) on Tuesday October 29, 2013 @09:35PM (#45275605) Homepage

          That reminds me of this gem:Overflow in sorting algorithms [blogspot.com]

          That little bug just sat around for a few decades before anyone noticed it.

          Quick summary: (low + high) / 2
          May have an overflow which is undefined behavior. Really every time we add ints it's possible. Just usually our values don't pass the MAX.

          • by russotto (537200)

            Quick summary: (low + high) / 2

            I like this one, because it shows a very common weakness in high level languages.

            In most machine languages, getting the average of two unsigned numbers up to UINT_MAX is absolutely trivial -- add the two, then shift right including the carry. The average of two signed numbers rounding to zero is a little more difficult (x86 makes it harder than it should be by not setting flags in a convenient manner), but still a few instructions.

            In C? Assuming low and high are unsigned
            (low

      • by Cryacin (657549)
        YOU SUNK MY BATTLESHIP!
      • by mveloso (325617)

        If the runtime moved memory around during a realloc, this code wouldn't work. However, you'd never notice if you use the same runtime all the time. This is why it's a good thing to compile/target different platforms and compilers, and to do a -Wall (or the equivalent) at every optimization level. You have to do it at every optimization level because some compilers only do checks like this during their optimization phase (gcc?).

        This type of thing wouldn't get caught by any automated tools when I was doing C

      • by Old Wolf (56093)

        The behaviour is also undefined if realloc returns NULL. Also, sizeof(char) is 1 by definition.

    • by Nanoda (591299) on Tuesday October 29, 2013 @07:43PM (#45274729)

      What is "unstable code" and how can a compiler leave it out?

      The article is actually using that as an abbreviation for what they're calling "optimization-unstable code", or code that is included at some specified compiler optimization levels, but discarded at higher levels. Basically they think it's unstable due to being included or not randomly, not because the code itself necessarily results in random behaviour.

    • by Spikeles (972972) on Tuesday October 29, 2013 @07:59PM (#45274849)

      The TFA links to the actual paper. Maybe you should read that.

      Towards Optimization-Safe Systems:Analyzing the Impact of Undefined Behavior [mit.edu]

      struct tun_struct *tun = ...;
      struct sock *sk = tun->sk;
      if (!tun)
      return POLLERR; /* write to address based on tun */

      For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined [24:6.5.3]. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

      • by Mateorabi (108522)
        This makes no sense. The dereference is undefined, and therefore sk may be undefined iff tun IS null but not tun.

        I.e. by the time execution reaches the if statement one of the two is true:
        tun != null && sk == {something valid} -or-
        tun == nul && sk == {undefined}

        sk being undefined is possible but that undefined-ness can't be used as a way to infer tun != null--the only thing that causes it is tun == null! It's illogical for the compiler to do what you say and remove the if check. The
        • by Old Wolf (56093) on Tuesday October 29, 2013 @10:06PM (#45275821)

          >The dereference is undefined, and therefore

          Stop right here. Once undefined behaviour occurs, "all bets are off" as they say; the remaining code may have any behaviour whatsoever. C works like this on purpose , and it's something I agree with. It means the compiler doesn't have to insert screeds of extra checks , both at compile-time and run-time.

          There are plenty of other languages you can use if you want a different language definition :)

    • "What every C programmer should know about undefined behaviour" (part 3 [llvm.org], see links for first 2 parts).

      For example, overflows of unsigned values is undefined behaviour in the C standard. Compilers can make decisions like using an instruction that traps on overflow if it would execute faster, or if that is the only operator available. Since overflowing might trap, and thus cause undefined behaviour, the compiler may assume that the programmer didn't intend for that to ever happen. Therefore this test will always evaluate to true, this code block is dead and can be eliminated.

      This is why there are a number of compilation optimisations that gcc can perform, but which are disabled when building the linux kernel. With those optimisations, almost every memory address overflow test would be eliminated.

      • by istartedi (132515)

        For example, overflows of unsigned values is undefined behaviour in the C standard.

        I'm glad I didn't know that when I used to play with software 3d engines back in the 90s. 16-bit unsigned integer "wrap around" was what made my textures tile. I do seem to vaguely recall that there was a compiler flag for disabling integer traps and that I disabled it. It was Microsoft's C compiler, and it's been a loooooong time.

        OK, I'm looking through the options on the 2005 free Visual Studio... I can find a flag to

      • Overflows of unsigned values are well-defined in C (they wrap). (Technically the standard says unsigned values can't overflow because they're wrapped)
        Overflows of signed values are undefined.

        • by istartedi (132515)

          OK, that explains why I've been getting away with assuming they wrap since the Clinton administration. I don't know if anybody ever explained it to me in C terms. I always assumed that behavior was baked in at the CPU level, and just percolated up to C. I never felt inclined to do any "bit twiddling" with int or even fixed-width signed integers because on an intuitive level it "felt wrong". What's that four-letter personality type thing? I'm pretty sure I had the I for "intutive" there...

      • by Old Wolf (56093)

        "Overflows of unsigned values" is NOT undefined. You can assign out-of-range values to unsigned types, and also perform arithmetic operations which exceed the bounds of the type; and the value is adjusted using modular arithmetic.

        Some would be facetious and say that "unsigned types cannot overflow", meaning that they always have well-defined behaviour on operations that would generate an out-of-range value, but that's just an issue of pedantry with English.

    • by mysidia (191772)

      I'd rather set -Wall and get a warning.

      I see your -Wall, and raise you a -Werror -pedantic

    • by tricorn (199664)

      I once had some code that confused me when the compiler optimized some stuff out.

      I had a macro that expanded to a parenthesized expression with several sub-expressions separated by commas that used a temp variable, e.g.:

      #define m(a) (tmp = a, f(tmp) + g(tmp))

      because the argument (a) could be an expression with side effects.

      Now, I knew that the order of evaluation of function arguments wasn't defined, but I never read that as meaning that a compiler could optimize away parts of a function call such as: x(m(

      • by Old Wolf (56093)

        I think you must be mis-remembering the details slightly. The comma operator is a sequence-point, so "tmp" must be assigned the value of "a", and f() and g() must both be called with a value that is the value of "a" converted to the type of "tmp". The two functions can be called in either order though (or in parallel) but there is no issue there.

        Of course, the compiler can do anything it likes so long as the program's output is equivalent to what I just described. So, for example, it might not allocate a

    • by Old Wolf (56093)

      >If my C code contains *foo=2, the compiler can't just leave that out

      Well, it could if the program produces no further output before exiting, or if "foo" is unassigned.

  • Inflammatory Subject (Score:5, Informative)

    by Imagix (695350) on Tuesday October 29, 2013 @07:29PM (#45274613)
    This is complaining because code which is already broken is broken more by the compiler? The programmer is already causing unpredictable things to happen, so even "leaving the code in" still provides no assurances of correct behaviour. An example of how the article is skewed:

    Since C/C++ is fairly liberal about allowing undefined behavior

    No, it's not. The language forbids undefined behavior. If your program invokes undefined behavior, it is no longer well-formed C or C++.

    • by Murdoch5 (1563847)
      You're right, there should never be undefined behavior or clueless development. If things are getting compiled out of the code then you clearly don't know enough about the compiler and language. I love when developers blame things like pointers and memory faults instead of the misuse of these by bad programming.
      • by HiThere (15173)

        That's nice. But when a language invites such things, that *is* a flaw in the language. I basically distrust pointers, but especially any pointers on which the user does arithmetic. Some people think that's a snazzy way to move through an array. I consider it recklessly dangerous stupidity, which is leaving you wide open to an undetected error with a simple typo.

        • by Murdoch5 (1563847)
          You can't blame a language for flaws when you decide to use the features you consider dangerous. Pointers are one of the most powerful features of C and if you know how to use them correctly and safety they will be very very powerful. Just because a pointer can completely grable memory and completely corrupt your stack and heap doesn't mean they will. C and ASM assume the programmer is smart enough to take memory management into there own hands and personally I completely agree. I hate all forms of auto
  • by Todd Knarr (15451) on Tuesday October 29, 2013 @07:42PM (#45274709) Homepage

    I haven't heard of any compiler that removes code just because it contains undefined behavior. All compilers I know of leave it in, and whether it misbehaves at run-time or not is... well, undefined. It may work just fine, eg. dereferencing a null pointer may just give you a block of zeroed-out read-only memory and what happens next depends on what you try to do with the dereferenced object. It may immediately crash with a memory access exception. Or it may cause all mounted filesystems to wipe and reformat themselves. But the code's still in the executable. I know compilers remove code that they've determined can't be executed, or where they've determined that the end state doesn't depend on the execution of the code, and that can cause program malfunctions (or sometimes cause programs to fail to malfunction, eg. an infinite loop in the code that didn't go into an infinite loop when the program ran because the compiler'd determined the code had no side-effects so it elided the entire loop).

    I'd also note that I don't know any software developers who use the term "unstable code" as a technical term. That's a term used for plain old buggy code that doesn't behave consistently. And compilers are just fine with that kind of code, otherwise I wouldn't spend so much time tracking down and eradicating those bugs.

    • 'I haven't heard of any compiler that removes code just because it contains undefined behavior.'
      Then your code may not be doing what you think it is.
      GCC, Clang, acc, armcc, icc, msvc, open64, pathcc, suncc, ti, windriver, xlc all do this.

      Click on the PDF, and scroll to page 4 for a nice table of optimisations vs compiler and optimisation level.

      _All_ modern compilers do this as part of optimisation.

      GCC 4.2.1 for example, with -o0 (least optimisation) will eliminate if(p+100p)

      C however says that an overflowed

    • by Anonymous Coward

      Yes it leads to real bugs - Brad Spengler uncovered one of these issues in the Linux kernel in 2009 [lwn.net] and it led to the kernel using the -fno-delete-null-pointer-checks gcc flag to disable the spec correct "optimisation".

    • The compiler doesn't leave out code with undefined behaviour - it assumes that there is no undefined behaviour, and draws conclusions from this.

      Example: Some people assume that if you add to a very large integer value, then eventually it will wrap around and produce a negative value. Which is what happens on many non-optimising compilers. So if you ask yourself "will adding i + 100 overflow?" you might check "if (i + 100
      But integer overflow is undefined behaviour. The compiler assumes that your code d
      • by Todd Knarr (15451)

        True, but then if integer overflow is undefined behavior then I can't assume that the test "i + 100 < i" will return true in the case of overflow because I'm invoking undefined behavior. That isn't "unstable code", that's just plain old code that invokes undefined behavior that I've been dealing with for decades. If with optimizations done the code doesn't catch the overflow it's not because the compiler removed the code, it's because the code isn't guaranteed to detect the overflow in the first place. N

    • Clang includes a number of compilation flags [llvm.org] that can be used to make sure, or at least as sure as it can, that your code never hits any undefined behaviour at run time.

      But normally, yes the compiler may change the behaviour of your application if you are depending on undefined behaviour.

    • by seebs (15766)

      gcc's been doing this for ages. We had a new compiler "break" the ARM kernel once. Turns out that something had a test for whether a pointer was null or not after a dereference of that pointer, and gcc threw out the test because it couldn't possibly apply.

  • -Wall (Score:3, Insightful)

    by Spazmania (174582) on Tuesday October 29, 2013 @07:50PM (#45274785) Homepage

    If I set -Wall and the compiler fails to warn me that it optimized out a piece of my code then the compiler is wrong. Period. Full stop.

    I don't care what "unstable" justification its authors gleaned from the standard, don't mess with my code without telling me you did so.

    • by mysidia (191772)

      I don't care what "unstable" justification its authors gleaned from the standard, don't mess with my code without telling me you did so.

      That's not what's happening..... they are talking about unstable optimizations; as in..... optimizations that aren't predictable, and while they don't change the semantics of the code according to the programming language ---- the optimization may affect what happens, if the code contains an error or operation that is runtime-undefined, such as a buffer overflow co

      • by Spazmania (174582)

        One of the examples from the paper was this snippet from the Linux kernel:

        struct sock *sk = tun->sk;
        if (!tun) return POLLERR;

        gcc's optimizer deleted "if (!tun) return POLLERR;" because *sk=tun->sk implies that tun!=NULL.

        Okay, I buy that. But if gcc did so without a warning with -Wall set then it gcc is broken. The author obviously expects it to be possible for tun==NULL, so if gcc decides it can't be that's a warning! Duh!

  • by Tablizer (95088) on Tuesday October 29, 2013 @07:52PM (#45274797) Homepage Journal

    many compilers actually remove code that it perceives to be undefined or unstable

    No wonder my app came out with 0 bytes.

  • PC Lint anyone? (Score:4, Informative)

    by ArcadeNut (85398) on Tuesday October 29, 2013 @07:53PM (#45274807) Homepage

    Back in the day when I was doing C++ work, I used a product called PC Lint (http://www.gimpel.com/html/pcl.htm) that did basically the same thing STACK does. Static Analysis of code to find errors such as referencing NULL pointers, buffer over flows, etc... Maybe they should teach History at MIT first...

    • Re:PC Lint anyone? (Score:4, Insightful)

      by EvanED (569694) <evaned@ g m a i l.com> on Tuesday October 29, 2013 @08:34PM (#45275113)

      Don't worry, the authors know what they're doing.

      Just because PC Lint could find a small number of potential bugs doesn't mean it's a solved problem by any means. Program analysis is still pretty crappy in general, and they made another improvement, just like tons of people before them, PC Lint before them, and tons of people before PC Lint.

  • by PolygamousRanchKid (1290638) on Tuesday October 29, 2013 @07:55PM (#45274823)

    It's a pretty cool critter, but I don't know if they actually sell it as a product. It might be something that they only use internally:

    http://www.research.ibm.com/da/beam.html [ibm.com]

    http://www.research.ibm.com/da/publications/beam_data_flow.pdf [ibm.com]

  • The C standard needs to meet with some realities to fix this issue. The C committee wants their language to be usable on the most esoteric of architectures, and this is the result.

    The reason that the result of signed integer overflow and underflow are not defined is because the C standard does not require that the machine be two's complement. Same for 1 31 and the negative of INT_MIN being undefined. When was the last time that you used a machine whose integer format was one's complement?

    Here are the th

    • by seebs (15766) on Tuesday October 29, 2013 @08:36PM (#45275125) Homepage

      Pretty sure the embedded systems guys wouldn't be super supportive of this, and they're by far the largest market for C.

      And I just don't think these are big sources of trouble most of the time. If people would just go read Spencer's 10 Commandments for C Programmers, this would be pretty much solved.

    • by mysidia (191772)

      * Fixation of two's complement as the integer format.

      Are you trying to make C less portable, or what?

      Not all platforms work exactly the same, and these additional constraints on datatypes would be a problem on platforms, where: well two's complement is not the signed integer format.

      Of course you're free to define your own augmented rules on top of C, as long as they're not the formal language standard --- and if you write compilers, you're free to constrain yourself into making your impleme

      • by Myria (562655)

        * Fixation of two's complement as the integer format.

        Are you trying to make C less portable, or what?

        The "broken" code is already nonportable to non-two's-complement machines, and much of this code is things critical to the computing and device world as a whole, such as the Linux kernel.

  • by AdamHaun (43173) on Tuesday October 29, 2013 @08:04PM (#45274879) Journal

    The article doesn't summarize this very well, but the paper (second link) provides a couple examples. First up:

    char *buf = ...;
    char *buf_end = ...;
    unsigned int len = ...;
    if (buf + len >= buf_end)
      return; /* len too large */

    if (buf + len < buf)
      return; /* overflow, buf+len wrapped around */ /* write to buf[0..len-1] */

    To understand unstable code, consider the pointer overflow check buf + len < buf shown [above], where buf is a pointer and len is a positive integer. The programmer's intention is to catch the case when len is so large that buf + len wraps around and bypasses the first check ... We have found similar checks in a number of systems, including the Chromium browser, the Linux kernel, and the Python interpreter.

    While this check appears to work on a flat address space, it fails on a segmented architecture. Therefore, the C standard states that an overflowed pointer is undefined, which allows gcc to simply assume that no pointer overflow ever occurs on any architecture. Under this assumption, buf + len must be larger than buf, and thus the "overflow" check always evaluates to false. Consequently, gcc removes the check, paving the way for an attack to the system.

    They then give another example, this time from the Linux kernel:

    struct tun_struct *tun = ...;
    struct sock *sk = tun->sk;
    if (!tun)
      return POLLERR; /* write to address based on tun */

    In addition to introducing new vulnerabilities, unstable code can amplify existing weakness in the system. [The above] shows a mild defect in the Linux kernel, where the programmer incorrectly placed the dereference tun->sk before the null pointer check !tun. Normally, the kernel forbids access to page zero; a null tun pointing to page zero causes a kernel oops at tun->sk and terminates the current process. Even if page zero is made accessible (e.g. via mmap or some other exploits), the check !tun would catch a null tun and prevent any further exploits. In either case, an adversary should not be able to go beyond the null pointer check.

    Unfortunately, unstable code can turn this simple bug into an exploitable vulnerability. For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

    The basic issue here is that optimizers are making aggressive inferences from the code based on the assumption of standards-compliance. Programmers, meanwhile, are writing code that sometimes violates the C standard, particularly in corner cases. Many of these seem to be attempts at machine-specific optimization, such as this "clever" trick from Postgres for checking whether an integer is the most negative number possible:

    int64_t arg1 = ...;
    if (arg1 != 0 && ((-arg1 < 0) == (arg1 < 0)))
      ereport(ERROR, ...);

    The remainder of the paper goes into the gory Comp Sci details and discusses their model for detecting unstable code, which they implemented in LLVM. Of particular interest is the table on page 9, which lists the number of unstable code fragments found in a variety of software packages, including exciting ones like Kerberos.

    • by Old Wolf (56093)

      While this check appears to work on a flat address space, it fails on a segmented architecture.

      It may not even work on a flat address space, if "buf"'s allocated block is right at the end of the addressable space.

  • by belphegore (66832) on Tuesday October 29, 2013 @08:48PM (#45275207) Homepage

    Checked out their git repo and did a build. They have a couple sketchy-looking warnings in their own code. A reference to an undefined variable; storing a 35-bit value in a 32-bit variable...

    lglib.c:6896:7: warning: variable 'res' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
    lglib.c:6967:10: note: uninitialized use occurs here
    plingeling.c:456:17: warning: signed shift result (0x300000000) requires 35 bits to represent, but 'int' only has 32 bits [-Wshift-overflow]

  • Know your C (Score:4, Informative)

    by gmuslera (3436) on Tuesday October 29, 2013 @08:49PM (#45275223) Homepage Journal
    Somewhat this made me remember that slideshow on Deep C [slideshare.net]. I only know that i don't know nothing of C, after reading it.
  • by countach (534280) on Tuesday October 29, 2013 @09:01PM (#45275321)

    It really should be time that 99.9% of the code written ought not to be in languages that have undefined behaviour. It's time we all use languages which are fully defined.

    Having said that, if something in code is undefined, and the compiler knows it, then it should generate an error. Very easily solved. If this STACK program is so clever, it should be in the compiler, and it should be an error to do something undefined.

  • Based on the headline, I thought it was going to be about Ken Thompson's self-referencing compiler [bell-labs.com] that not only inserted a back door whenever it saw that it was compiling the UNIX login command, it also inserted the back door insertion code whenever it saw it was compiling the compiler source code.

There are running jobs. Why don't you go chase them?

Working...