Forgot your password?
typodupeerror
Programming Software

Overeager Compilers Can Open Security Holes In Your Code 199

Posted by Soulskill
from the i-blame-the-schools dept.
jfruh writes: "Creators of compilers are in an arms race to improve performance. But according to a presentation at this week's annual USENIX conference, those performance boosts can undermine your code's security. For instance, a compiler might find a subroutine that checks a huge bound of memory beyond what's allocated to the program, decide it's an error, and eliminate it from the compiled machine code — even though it's a necessary defense against buffer overflow attacks."
This discussion has been archived. No new comments can be posted.

Overeager Compilers Can Open Security Holes In Your Code

Comments Filter:
  • by iggymanz (596061) on Friday June 20, 2014 @04:02PM (#47284407)

    well known for decades that optimizing compilers can produce bugs, security holes, code that doesn't work at all, etc.

    • by NoNonAlphaCharsHere (2201864) on Friday June 20, 2014 @04:13PM (#47284475)
      That's why I always use a pessimizing compiler.
    • by KiloByte (825081) on Friday June 20, 2014 @04:37PM (#47284647)

      Or rather, that optimizing compilers can expose bugs in buggy code that weren't revealed by naive translation.

      • I'm going with old news from decades ago [gnu.org].
        • Re: (Score:3, Interesting)

          by itzly (3699663)
          That's an example of a programmer not understanding the rules of a conforming C/C++ compiler. It should be fixed in the source, not in the compiler.
          • by tepples (727027)
            Perhaps the problem is that standard C isn't expressive enough to express some operations that some programs require, especially with respect to detection of integer arithmetic overflows.
            • I've always preferred inline assembly or linked asm routines for tricky bits, but the problem then is it's not portable.

              • by Euler (31942)

                I don't think any language is portable at that level, so you may as well use asm (my preference is linked instead of inline to ensure a clean and simple abstraction.) Every processor has different math status registers and different math instruction capabilities. I'm not sure how C can express these things.

            • by russotto (537200)

              Perhaps the problem is that standard C isn't expressive enough to express some operations that some programs require, especially with respect to detection of integer arithmetic overflows.

              Indeed; the compiler's even allowed to assume signed integer overflow doesn't happen, which is where you get into trouble. Yet we have this perfectly good mechanism for detecting integer overflow (condition codes) and no way to reach them from high level languages (C isn't unique in this respect)

              • That's an OLD problem. When I was using FORTRAN 66, I had to put a sentinel card at the end of the data I was reading. (Actually, I usually used a nonstandard extension.) The OS knew when my data was all read, but there was no standard way to let FORTRAN know.

            • by Darinbob (1142669)

              Which is where a compiler should actually issue a warning. Then the programmer, if deciding the code really is necessary that way, adds some pragmas, attributes, or typecasts, and then double checks the compiler output afterwards. I agree though that such cases are sometimes very useful, however a compiler must never optimize this code away silently.

          • by sjames (1099)

            Arguably, it's a bug in the standard. It defies the principle of least astonishment for a procedural language.

    • by Marillion (33728) <ericbardes@@@gmail...com> on Friday June 20, 2014 @04:45PM (#47284715)
      Right. The other part of the issue is why didn't anyone write a test to verify that the buffer overflow detection code actually detects when you overflow buffers?
      • by AuMatar (183847) on Friday June 20, 2014 @04:51PM (#47284749)

        Because it worked in debug mode (which generally has optimizations off)?
        Because it was tested on a compiler without this bug? The people writing the memory library is usually not the people writing the app that uses it.
        Similarly, it was tested on the same compiler, but with different compiler flags?
        Because that optimization didn't exist in the version of the compiler it was tested on?
        Because the test app had some code that made the compiler decide not to apply the optimzation?
        Life is messy. Testing doesn't catch everything.

        • by lgw (121541)

          If you're testing only in debug mode, you're doing it wrong. Do your customers run debug? No? Then all your tests must run retail or you fail.

          Is this a compiler bug? Doubtful. Chances are it's code that isn't standard that was getting away with its non-standard behavior until the compiler started enforcing this bit.

          Compiler flags? Again, test the binaries your users will run.

          Testing can catch a lot, if not half-assed.

          • by AuMatar (183847)

            And if you're writing a library, you don't know what compiler much less what flags the user will use. Are you willing to pay several thousand dollars for seats for obscure compilers?

            Yeah, you're an idiot.

            • by swillden (191260)
              Provide a test suite along with the library. Users may or may not bother to run it, but at least then it's on them.
            • by lgw (121541)

              If you're writing a library that messes around in the dark corners of C, you should know exactly what you're doing WRT the standard. And what oddball compilers are still in use in an environment where people are consuming open source libraries? Obviously, you want to cover GCC, clang/llvm, VS, and maybe Intel yourself. For the rest, as the sibling post says, libraries should come with unit tests in the modern world,

              • by AuMatar (183847)

                Most of the embedded world? The embedded world heavily uses compilers like RVDS. And nobody is going to bother running the unit tests. In addition, its damn hard to write unit tests that test something that should fail in a stack dump.

      • by gweihir (88907)

        That would be, you know, sane? And competent? Cannot have that in somebody that does software...

        There are also nice compiler directives that are used to switch off optimization for some functions.

    • by Darinbob (1142669)

      Though technically the compiler would be incorrect by removing "unnecessary" code when it actually is necessary. The thing is, if the compiler is good enough to detect that the code is checking out of bounds when it doesn't need to, then the compiler was able to logically infer that it was impossible to write code out of bounds, which means it should be good enough to print out a warning when the code does have possibility of buffer overlows.

      The examples given in the articles listed are things that I would

  • Unsable Code, again (Score:5, Informative)

    by Anonymous Coward on Friday June 20, 2014 @04:03PM (#47284413)
    This is just as poorly written up as last time [slashdot.org]. These are truly bugs in the programs using undefined parts of the language. It's silly to blame the compiler.
    • by Darinbob (1142669)

      I blame both. When a compiler discovers code with undefined behavior in the language, then the compiler should issue a warning rather than taking it as an opportunity to silently perform some other optimizations. In other words, the compiler KNOWS the code is faulty, and it can't perform the extra optimizations without first knowing this. Thus I absolutely blame the compiler writers here.

      Ie, if the compiler sees "y = 0; z = x / y;" then it should warn about division by zero. Instead if the compiler thi

      • How about when the computer sees "z = x / y;"? Should the compiler warn of possible undefined behavior if it can't prove that y can't be zero?

  • by Mdk754 (3014249) on Friday June 20, 2014 @04:04PM (#47284421) Homepage
    Wow, you know you're ready to go home when it's Friday afternoon and you read:

    But according to a presentation at this week's annual UNISEX conference

    • by PPH (736903)

      I was wondering why my compiler was generating warnings about traps. I thought it meant something about catching buffer overflow conditions.

    • by OakDragon (885217)
      So do overeager compilers suffer from premature optimization?
      • by Darinbob (1142669)

        Hang on, this compilation is going to be great! I'm so excited! Let me just start parsing those header files, and... Oh, we're done. Apparently I've optimized away almost all your code. Thank you and good night.

    • Did you just type that on your Unicomp keyboard attached to your Univac?
  • by Anonymous Coward on Friday June 20, 2014 @04:05PM (#47284427)

    Any code removal by the compiler can be prevented by correctly
    coding the code with volatile (in C) or its equivalent.

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Except not, so now we have explicit_bzero()

    • Re: (Score:2, Insightful)

      by vux984 (928602)

      Any code removal by the compiler can be prevented by correctly coding the code with volatile (in C) or its equivalent.

      Knowing that the code will be removed by the compiler is paramount to using the volatile keyword.

      That requires knowing a lot more about what the compiler is going to do then one should presume. The developer should not have to have foreknowledge of what compiler optimizations someone might enable in the future, especially as those optimizations might not even be known in the present.

      The norm

      • by Threni (635302)

        ---
        I can't offhand think of many situations where I could say with any degree of certainty that if something read or wrote to memory externally that it wouldn't matter, and it would rarely be the best use of my time to try an establish it... so really... mark everything volatile all the time.

        Clearly THAT isn't right.
        ---

        Yeah, that would be a poor design. You use volatile when you need to. That's the rule. You just need to work out when you need to.

        > If you do not expect the memory to be written to or rea

        • by vux984 (928602)

          I'm not sure you typed that in right, or maybe you don't understand what volatile means or when to use it.

          No I typed it in right. That's the point. To guard against security flaws, you need to specify volatile in situations where you explicitly expect it not to be volatile; so that any sanity checks you put in place don't get optimized out as redundant/dead code.

          For security you effectively have to assume all memory is volatile, and that it might be changed when it shouldn't be.

          For a contrived example,

          stati

          • by ultranova (717540)

            If I don't mark it volatile then the sanity check gets optimized away, and the software is vulnerable.

            Except it's just as vulnerable either way. Your sanity check never executes even if x and y are volatile, since the buffer overrun in somestuff() already transferred control to pwned() when somestuff() tried to return. All your sanity check does is give you a false sense of security.

            Yes, yes, the real problem is in somestuff() where the buffer got overrun in the first place, but that's someone else's code

            • by vux984 (928602)

              since the buffer overrun in somestuff() already transferred control to pwned() when somestuff() tried to return

              That is only -one- failure mode, not all buffer overruns will let you overwrite to the instruction pointer to transfer control to your code.

              Security is layered.

              Yes, and that problem is not solvable in C, no matter how many sanity checks you litter your code with. As soon as you execute any code from an untrusted source, you don't have any security. Any error puts the system into an undefined state,

      • Except that "volatile" means that the memory might be accessed through methods other than the program, which is the exact sort of thing we want to test for. In C++, "volatile" means that all memory accesses must be performed in the given order, and in proper order with other volatile memory accesses and calls to I/O routines. This removes a lot of optimization possibilities, which is why we'd generally rather not call variables "volatile". For a buffer overflow test, we know that the buffer can't be cha

        • by vux984 (928602)

          Except that "volatile" means that the memory might be accessed through methods other than the program,

          Exactly right.

          The trouble is in the case of a bug or vulnerability, even non-volatile memory might be accessed/updated when its not supposed to be. And any sanity checks or range changes or bounds checks you might write to try and close the potential holes require the non-volatile memory be marked volatile to prevent the compiler optimizing out the checks.

          See my other reply for a contrived example of what I

          • Yup. I was thinking of sections of memory allocated strictly to stay zeroed out and then scanned, presumably part of a larger allocation and put there by placement new. Range and bounds checks will not be optimized out. If you have code that says "make sure i is a valid subscript, and if it is return the array value, otherwise do something else", that's completely conforming and will be compiled normally. (Code that says "Return the array value, oh, and by the way, check i" is subject to having the che

      • by ultranova (717540)

        Most security related vulnerabilities arising from compiler optimization tend to revolve around the idea that you are defending against memory being modified externally that should not normally be modified or read from externally.

        In other words, these vulnerabilities don't rise from compiler optimizations but from programmer errors. They only relate to compiler optimizations insofar that those optimizations interfere with ad-hock attempts to get managed language behaviour in an unmanaged language. Which is

        • by vux984 (928602)

          either write code that can be checked to be secure by static analysis tools

          Which just catches a limited subset of vulnerabilities.

          or switch to a real managed language.

          Because you can't write exploitable code in a managed language?

      • by Darinbob (1142669)

        Knowing that the code is being removed by the compiler also means that you know that your compiler is broken. The compiler should have issued a warning pointing out that the code is using undefined behavior, not take that undefined behavior as a lame excuse to try some more optimizations. Of course the C/C++ standards do not dictate that warnings must be issued in these cases.

        The article itself from what I saw, said nothing about security related vulnerabilities, and nothing about buffer overflow checking

        • by vux984 (928602)

          The compiler should have issued a warning pointing out that the code is using undefined behavior

          Not all code the compiler removes is using undefined behaviour. I am not talking about undefined behaviors being 'optimized' away.

          The article itself

          Yeah, we've sort of gone off track vis a vis the original article. The scope of issues the original article talks about is smaller, and I agree with you fully that the compilers should be issuing warnings for the particular scenarios they are talking about.

    • volatile is a storage class, meaning that something else (i.e. another process) might modify the memory location, meaning the compiler shouldn't remove reads even though it knows that you haven't modified it since it was last read and there might still be copy left lying about in a register. Even if you apply it to the buffer, it doesn't mean that the compiler can't decide that memory you didn't allocate doesn't belong to you anyways and remove the check. Additionally, volatile typically applies to single
    • "...decide it's an error.."

      No, it is an "optimizing" compiler not a "correcting" compiler. The optimizer can detect that no language defined semantic will be changed by removing the code, so it does. As others have noted, "volatile" is the fix for this particular coding / compiler blunder. However ill-defined, it is *not an error*.

      As for the folks commenting that only C can run in small embedded processors that's hogwash. Huge mainframes of the early ages had smaller memory sizes and ran FORTRAN (now Fortra

      • by dkf (304284)

        Most made entire classes of C blunders impossible

        Don't worry about that! They had their own classes of blunders instead. (Every programming language has a characteristic set of problems that come up, and a set of recommended programming practices that avoid those blunders.)

    • by Darinbob (1142669)

      Except that the compiler was buggy in the first place by removing such code. Thus it is "over eager". The article is not really about logically correct dead code removal causes security bugs, but about compilers that incorrectly remove code rather than issuing warning or errors in the case of undefined behavior.

  • Bad summary is bad (Score:5, Informative)

    by werepants (1912634) on Friday June 20, 2014 @04:07PM (#47284439)
    This is not really about the existence of bad compiler optimization - it is about a tool called Stack that can be used to detect this, which is known as "unstable" code, and has been used to find lots of vulnerabilities already.
    • Actually it's about non-standard-conforming "security" hacks causing unexpected results. If the result of an operation is undefined, the compiler can insert code to summon Cthulhu if it wants to.
      • by dkf (304284)

        Actually it's about non-standard-conforming "security" hacks causing unexpected results. If the result of an operation is undefined, the compiler can insert code to summon Cthulhu if it wants to.

        If your compiler is doing that, you should choose a different compiler. Summoning elder gods just because signed arithmetic might wrap around is not a good cost/benefit tradeoff!

        • by careysub (976506)

          ....Summoning elder gods just because signed arithmetic might wrap around is not a good cost/benefit tradeoff!

          Make that Elder Gods. Respect is essential - we do not wish to arouse their wrath. Nyarlathotep be praised!

    • by Darinbob (1142669)

      But it calls into questions those compilers. Why did the compilers decide to exploit the unstable code to do more optimization rather than pointing out the unstable code as a warning or error? After all the only way that they could do this extra optimization is if they knew that there were some undefined operations that allowed it to make some further analysis, and such undefined operations might be better treated as defective code.

      Although to be fair the examples I saw may have just been oversimplified c

  • Old news (Score:4, Informative)

    by Anonymous Coward on Friday June 20, 2014 @04:10PM (#47284463)

    I know that at least GCC will get rid of overflow checks if they rely on checking the value after overflow (without any warning), because C defines that overflow on signed integers is undefined. This is even documented. If anything is declared by the language specification as being undefined, expect trouble.

    • by Darinbob (1142669)

      So why doesn't GCC issue a warning about undefined operations in this case? It certainly issues plenty of warnings in cases that are well defined and not violating any rules.

      What "undefined" means here for most compilers is that it will make the best attempt it can under the C rules but the results may vary on different machines. Ie, it will use the underlying machine code for adding two registers, which may wrap around or possibly saturate instead, and the machine may not even be using tw's complement.

      • by Thiez (1281866)

        What "undefined" means here for most compilers is that it will make the best attempt it can under the C rules but the results may vary on different machines. Ie, it will use the underlying machine code for adding two registers, which may wrap around or possibly saturate instead, and the machine may not even be using tw's complement.

        No, that would be implementation defined behavior.

  • by Anonymous Coward

    The kinds of checks that compilers eliminate are ones which are incorrectly implemented (depend on undefined behavior) or happen too late (after the undefined behavior already was triggered). The actual article is reasonable— it's about a tool to help detect errors in programs that suffer here. The compilers are not problematic.

    • So what's the standard-conforming way to determine whether a particular integer operation will not overflow? And are compilers smart enough to optimize the standard-conforming way into something that uses the hardware's built-in overflow detection, such as carry flags?
    • by Darinbob (1142669)

      Why aren't the compilers problematic? From what I have read the compilers seem to be exploiting the undefined behavior as an opportunity to perform additional optimization, whereas the compiler seems like it should instead warn the user about the undefined behavior.

      Maybe it's that the examples that I saw, referenced from the article, are all ones where most basic static analysis tools will flag such code as defective. I haven't seen an example yet where the code is reasonable and does not deserve any comp

  • by Anonymous Coward

    Compilers can also "optimize" away Kahan summation algorithm. See page 6 of How Futile are Mindless Assessments of Roundoff in Floating-Point Computation [berkeley.edu]

  • Short of bugs in the compiler's optimizer — and we all know there have been many — the idea that "if the entire code absolutely must stay fully intact, it shouldn't be optimized" is already dangerous.

    A compiler conforming to its documentation or standard isn't going to change semantics that have been guaranteed by that document. Those guarantees though are all you have: even without explicit optimization options, a compiler has a lot of freedom in how it implements those semantics. Relying on

  • by Smerta (1855348) on Friday June 20, 2014 @04:35PM (#47284617)

    The classic example of a compiler interfering with intention, opening security holes, is failure to wipe memory.

    On a typical embedded system - if there is such a thing (no virtual memory, no paging, no L3 cache, no "secure memory" or vault or whatnot) - you might declare some local (stack-based) storage for plaintext, keys, etc. Then you do your business in the routine, and you return.

    The problem is that even though the stack frame has been "destroyed" upon return, the contents of the stack frame are still in memory, they're just not easily accessible. But any college freshman studying computer architecture knows how to get to this memory.

    So the routine is modified to wipe the local variables (e.g. array of uint8_t holding a key or whatever...) The problem is that the compiler is smart, and sees that no one reads back from the array after the wiping, so it decides that the observable behavior won't be affected if the wiping operation is elided.

    My making these local variables volatile, the compiler will not optimize away the wiping operations.

    The point is simply that there are plenty of ways code can be completely "correct" from a functional perspective, but nonetheless terribly insecure. And often the same source code, compiled with different optimization options, has different vulnerabilities.

    • by Darinbob (1142669)

      If there is external evidence that the compiler needs in order to compile/optimize correctly then it needs to be given to the compiler. Thus the addition of "volatile". Other compilers have a way to add options on the command line, such as indicating that the machine uses strict alignment.

      The problem here is however not that there is some strange stuff happening with the program, and that code is being optimized away because the compiler can prove that it serves no purpose given the lack of additional inf

    • Unless all the code running on the machine is absolutely type-safe and only allows "safe" reflection then trying to hide sensitive data from other bits of code in your address space is a lost cause. Code modification, emulation, tracing, breakpoint instructions, hardware debugger support, etc. are all viable ways for untrusted code with access to your address space to steal your data.

      Wiping memory is only effective for avoiding hot or cold boot attacks against RAM, despite its frequent use for hacking terr

  • I always insist on a clean compile with the warning level turned up as high as it will go. If the compiler is cool with my code, I have a better chance it will do the right thing with it.

    Once I have an application that works I see if it meets performance goals (if any). If it does, I'm done. If it doesn't, profile, find the hot spots, optimize as needed. Compiling an entire application with -O3 is idiotic, and misses the point.

    ...laura

C makes it easy for you to shoot yourself in the foot. C++ makes that harder, but when you do, it blows away your whole leg. -- Bjarne Stroustrup

Working...