Forgot your password?
typodupeerror
Programming Operating Systems Security Software

How Your Compiler Can Compromise Application Security 470

Posted by Soulskill
from the my-compiler-levels-me-out dept.
jfruh writes "Most day-to-day programmers have only a general idea of how compilers transform human-readable code into the machine language that actually powers computers. In an attempt to streamline applications, many compilers actually remove code that it perceives to be undefined or unstable — and, as a research group at MIT has found, in doing so can make applications less secure. The good news is the researchers have developed a model and a static checker for identifying unstable code. Their checker is called STACK, and it currently works for checking C/C++ code. The idea is that it will warn programmers about unstable code in their applications, so they can fix it, rather than have the compiler simply leave it out. They also hope it will encourage compiler writers to rethink how they can optimize code in more secure ways. STACK was run against a number of systems written in C/C++ and it found 160 new bugs in the systems tested, including the Linux kernel (32 bugs found), Mozilla (3), Postgres (9) and Python (5). They also found that, of the 8,575 packages in the Debian Wheezy archive that contained C/C++ code, STACK detected at least one instance of unstable code in 3,471 of them, which, as the researchers write (PDF), 'suggests that unstable code is a widespread problem.'"
This discussion has been archived. No new comments can be posted.

How Your Compiler Can Compromise Application Security

Comments Filter:
  • Inflammatory Subject (Score:5, Informative)

    by Imagix (695350) on Tuesday October 29, 2013 @07:29PM (#45274613)
    This is complaining because code which is already broken is broken more by the compiler? The programmer is already causing unpredictable things to happen, so even "leaving the code in" still provides no assurances of correct behaviour. An example of how the article is skewed:

    Since C/C++ is fairly liberal about allowing undefined behavior

    No, it's not. The language forbids undefined behavior. If your program invokes undefined behavior, it is no longer well-formed C or C++.

  • by Anonymous Coward on Tuesday October 29, 2013 @07:38PM (#45274679)

    An example of "unstable code":

    char *a = malloc(sizeof(char));
    *a = 5;
    char *b = realloc(a, sizeof(char));
    *b = 2;
    if (a == b && *a != *b)
    {
            launchMissiles();
    }

    A cursory glance at this code suggests missiles will not be launched. With gcc, that's probably true at the moment. With clang, as I understand it, this is not true -missiles will be launched. The reason for this is that the spec says that the first argument of realloc becomes invalid after the call, therefore any use of that pointer has undefined behaviour. Clang takes advantage of this, and defines the behaviour of this to be that *a will not change after that point. Therefore it optimises if (a == b && *a != *b) into if (a == b && 5 != *b). This clearly then passes, and missiles get launched.

    The truth here is that your compiler is not compromising application security – the code that relies on undefined behaviours is.

  • Re:News flash (Score:5, Informative)

    by war4peace (1628283) on Tuesday October 29, 2013 @07:40PM (#45274687)

    I would also like to understand what's the definition of "unstable code".

  • by Nanoda (591299) on Tuesday October 29, 2013 @07:43PM (#45274729)

    What is "unstable code" and how can a compiler leave it out?

    The article is actually using that as an abbreviation for what they're calling "optimization-unstable code", or code that is included at some specified compiler optimization levels, but discarded at higher levels. Basically they think it's unstable due to being included or not randomly, not because the code itself necessarily results in random behaviour.

  • PC Lint anyone? (Score:4, Informative)

    by ArcadeNut (85398) on Tuesday October 29, 2013 @07:53PM (#45274807) Homepage

    Back in the day when I was doing C++ work, I used a product called PC Lint (http://www.gimpel.com/html/pcl.htm) that did basically the same thing STACK does. Static Analysis of code to find errors such as referencing NULL pointers, buffer over flows, etc... Maybe they should teach History at MIT first...

  • by Anonymous Coward on Tuesday October 29, 2013 @07:56PM (#45274825)

    Yes it leads to real bugs - Brad Spengler uncovered one of these issues in the Linux kernel in 2009 [lwn.net] and it led to the kernel using the -fno-delete-null-pointer-checks gcc flag to disable the spec correct "optimisation".

  • by dgatwood (11270) on Tuesday October 29, 2013 @07:56PM (#45274827) Journal

    Another, more common example of code optimizations causing security problems is this pattern:

    int a = [some value obtained externally];
    int b = a + 2;
    if (b < a) {
    // integer overflow occurred ...
    }

    The C spec says that signed integer overflow is undefined. If a compiler does no optimization, this works. However, it is technically legal for the compiler to rightfully conclude that two more than any number is always larger than that number, and optimize out the entire "if" statement and everything inside it.

    For proper safety, you must write this as:

    int a = [some value obtained externally];
    if (INT_MAX - a < 2) {
    // integer overflow will occur ...
    }
    int b = a + 2;

  • by Spikeles (972972) on Tuesday October 29, 2013 @07:59PM (#45274849)

    The TFA links to the actual paper. Maybe you should read that.

    Towards Optimization-Safe Systems:Analyzing the Impact of Undefined Behavior [mit.edu]

    struct tun_struct *tun = ...;
    struct sock *sk = tun->sk;
    if (!tun)
    return POLLERR; /* write to address based on tun */

    For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined [24:6.5.3]. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

  • "What every C programmer should know about undefined behaviour" (part 3 [llvm.org], see links for first 2 parts).

    For example, overflows of unsigned values is undefined behaviour in the C standard. Compilers can make decisions like using an instruction that traps on overflow if it would execute faster, or if that is the only operator available. Since overflowing might trap, and thus cause undefined behaviour, the compiler may assume that the programmer didn't intend for that to ever happen. Therefore this test will always evaluate to true, this code block is dead and can be eliminated.

    This is why there are a number of compilation optimisations that gcc can perform, but which are disabled when building the linux kernel. With those optimisations, almost every memory address overflow test would be eliminated.

  • by AdamHaun (43173) on Tuesday October 29, 2013 @08:04PM (#45274879) Journal

    The article doesn't summarize this very well, but the paper (second link) provides a couple examples. First up:

    char *buf = ...;
    char *buf_end = ...;
    unsigned int len = ...;
    if (buf + len >= buf_end)
      return; /* len too large */

    if (buf + len < buf)
      return; /* overflow, buf+len wrapped around */ /* write to buf[0..len-1] */

    To understand unstable code, consider the pointer overflow check buf + len < buf shown [above], where buf is a pointer and len is a positive integer. The programmer's intention is to catch the case when len is so large that buf + len wraps around and bypasses the first check ... We have found similar checks in a number of systems, including the Chromium browser, the Linux kernel, and the Python interpreter.

    While this check appears to work on a flat address space, it fails on a segmented architecture. Therefore, the C standard states that an overflowed pointer is undefined, which allows gcc to simply assume that no pointer overflow ever occurs on any architecture. Under this assumption, buf + len must be larger than buf, and thus the "overflow" check always evaluates to false. Consequently, gcc removes the check, paving the way for an attack to the system.

    They then give another example, this time from the Linux kernel:

    struct tun_struct *tun = ...;
    struct sock *sk = tun->sk;
    if (!tun)
      return POLLERR; /* write to address based on tun */

    In addition to introducing new vulnerabilities, unstable code can amplify existing weakness in the system. [The above] shows a mild defect in the Linux kernel, where the programmer incorrectly placed the dereference tun->sk before the null pointer check !tun. Normally, the kernel forbids access to page zero; a null tun pointing to page zero causes a kernel oops at tun->sk and terminates the current process. Even if page zero is made accessible (e.g. via mmap or some other exploits), the check !tun would catch a null tun and prevent any further exploits. In either case, an adversary should not be able to go beyond the null pointer check.

    Unfortunately, unstable code can turn this simple bug into an exploitable vulnerability. For example, when gcc first sees the dereference tun->sk, it concludes that the pointer tun must be non-null, because the C standard states that dereferencing a null pointer is undefined. Since tun is non-null, gcc further determines that the null pointer check is unnecessary and eliminates the check, making a privilege escalation exploit possible that would not otherwise be.

    The basic issue here is that optimizers are making aggressive inferences from the code based on the assumption of standards-compliance. Programmers, meanwhile, are writing code that sometimes violates the C standard, particularly in corner cases. Many of these seem to be attempts at machine-specific optimization, such as this "clever" trick from Postgres for checking whether an integer is the most negative number possible:

    int64_t arg1 = ...;
    if (arg1 != 0 && ((-arg1 < 0) == (arg1 < 0)))
      ereport(ERROR, ...);

    The remainder of the paper goes into the gory Comp Sci details and discusses their model for detecting unstable code, which they implemented in LLVM. Of particular interest is the table on page 9, which lists the number of unstable code fragments found in a variety of software packages, including exciting ones like Kerberos.

  • by EvanED (569694) <evaned&gmail,com> on Tuesday October 29, 2013 @08:47PM (#45275199)

    The first mistake was using signed integers. unsigned integers always have well-defined overflow (modulo semantics), which means it's easier to construct safe conditionals

    Not in C and C++ they don't. The compiler is allowed to perform that optimization with either signed or unsigned integers.

  • Know your C (Score:4, Informative)

    by gmuslera (3436) on Tuesday October 29, 2013 @08:49PM (#45275223) Homepage Journal
    Somewhat this made me remember that slideshow on Deep C [slideshare.net]. I only know that i don't know nothing of C, after reading it.
  • Re:News flash (Score:5, Informative)

    by ShanghaiBill (739463) on Tuesday October 29, 2013 @09:32PM (#45275573)

    Didn't RTFA because this is /., but I'd guess that it's code that works now but is fragile under a change of compiler, compiler version, optimization level, or platform.

    Yes, you didn't RTFA, because your definition actually makes sense. TFA defines "unstable code" as code with undefined behavior. TFA also claims that many compilers simply DELETE such code. I have never seen a compiler that does that, and I seriously doubt if is really common. Does anyone know of a single compiler that does this? Or is TFA just completely full of crap (as I strongly suspect)?

  • Re:News flash (Score:5, Informative)

    by EvanED (569694) <evaned&gmail,com> on Tuesday October 29, 2013 @09:44PM (#45275663)

    Yes, you didn't RTFA, because your definition actually makes sense. TFA defines "unstable code" as code with undefined behavior.

    ...and undefined behavior is exactly what causes the things I listed.

    TFA also claims that many compilers simply DELETE such code. I have never seen a compiler that does that, and I seriously doubt if is really common.

    You probably haven't used any desktop compilers.

    Just a sampling:

    • During MS's security push a decade ago, they discovered that the compiler was optimizing away the memset in code such as memset(password, '\0', len); free(password); that was limiting the lifetime of sensitive information, because the assignment to password in the memset was a dead assignment -- it was never read from (not actually undefined behavior, but it is an example of the compiler deleting unused code that was actually there for a purpose)
    • I linked part 3 of this series to you in another response, but the first example in here [llvm.org] discusses such an optimization that GCC did which removed security checks in the Linux kernel (see also this series [regehr.org] -- look down at "A Fun Case Analysis")
    • GCC has long turned on -fno-strict-aliasing because optimizations based on the strict aliasing assumption break the kernel (more precisely: code that violates the standard's strict aliasing rules was being "mis-"optimized), though I don't know if it led to security implications
  • Re:News flash (Score:4, Informative)

    by TheRaven64 (641858) on Wednesday October 30, 2013 @05:16AM (#45277893) Journal

    I have never seen a compiler that does that, and I seriously doubt if is really common. Does anyone know of a single compiler that does this?

    The only compilers I know of that definitely do this are GCC, LLVM, ICC, Open64, ARMCC, and XLC, but others probably do too. Compilers use undefined behaviour to propagate unreachable state and aggressively trim code paths. There's a fun case in ARM's compiler, where you write something like this:

    int x[5];
    int y;
    ...
    for (int i=0 ; i<10 ; i++)
    y += x[i];

    The entire loop is optimised away to an infinite loop. Why? Because accesses to array elements after the end of the array are undefined. This means that, when you write x[i] then either i is in the range 0-4 (inclusive), or you are hitting undefined behaviour. Because the compiler can do anything it wants in cases of undefined behaviour, it is free to assume that they never occur. Therefore, it assumes that, at the end of the loop, i is always less than 5. Therefore, i++ is always less than 10, and therefore the loop will never terminate. Therefore, since the body of the loop has no side effects, it can be elided. Therefore, the declarations of x and y are never read from in anything with side effects and so can be elided. Therefore, the entire function becomes a single branch instruction that just jumps back to itself.

    If your code relies on undefined behaviour, then it's broken. A compiler is entirely free to do whatever it wants in the cases where the behaviour is undefined. Checking for undefined behaviour statically is very hard, however (consider trying to check for correct use of the restrict keyword - you need to do accurate alias analysis on the entire program) and so compilers won't warn you in all cases. Often, the undefined behaviour is only apparent after inlining, at which point it's difficult to tell what the source of the problem was.

"Indecision is the basis of flexibility" -- button at a Science Fiction convention.

Working...