How Your Compiler Can Compromise Application Security 470
jfruh writes "Most day-to-day programmers have only a general idea of how compilers transform human-readable code into the machine language that actually powers computers. In an attempt to streamline applications, many compilers actually remove code that it perceives to be undefined or unstable — and, as a research group at MIT has found, in doing so can make applications less secure. The good news is the researchers have developed a model and a static checker for identifying unstable code. Their checker is called STACK, and it currently works for checking C/C++ code. The idea is that it will warn programmers about unstable code in their applications, so they can fix it, rather than have the compiler simply leave it out. They also hope it will encourage compiler writers to rethink how they can optimize code in more secure ways. STACK was run against a number of systems written in C/C++ and it found 160 new bugs in the systems tested, including the Linux kernel (32 bugs found), Mozilla (3), Postgres (9) and Python (5). They also found that, of the 8,575 packages in the Debian Wheezy archive that contained C/C++ code, STACK detected at least one instance of unstable code in 3,471 of them, which, as the researchers write (PDF), 'suggests that unstable code is a widespread problem.'"
TFA does a poor job of defining what's happening (Score:5, Insightful)
If my C code contains *foo=2, the compiler can't just leave that out. If my code contains if (foo) { *foo=2 } else { return EDUFUS; } it can verify that my code is checking for NULL pointers. That's nice; but the questions remain:
What is "unstable code" and how can a compiler leave it out? If the compiler can leave it out, it's unreachable code and/or code that is devoid of semantics. No sane compiler can alter the semantics of your code, at least no compiler I would want to use. I'd rather set -Wall and get a warning.
Null pointer detection at compile time (Score:2)
I'd rather set -Wall and get a warning.
There are some undefined behaviors that can't be detected so easily at compile time, at least not without a big pile of extensions to the C language. For example, if a pointer is passed to a function, is the function allowed to dereference it without first checking it for NULL? The Rust language doesn't allow assignment of NULL to a pointer variable unless it's declared as an "option type" (Rust's term for a value that can be a pointer or None).
Re:Null pointer detection at compile time (Score:5, Insightful)
Of course it is, and it is supposed to be able to do so. If you were an embedded systems programmer you would know that, and also know why. Next you'll be complaining that languages allow infinite loops (again, a very useful thing to be able to do). C doesn't protect the programmer from himself, and that's by design. Compilers have switches for a reason. If they don't know how it is being built or what the purpose of the code is then they can't possibly determine with another program if the code is "unstable".
Re: (Score:2)
Of course it is, and it is supposed to be able to do so.
Actually no, you're not, or you're programming in Some-C-Like-Language and not C. In C, dereferencing a NULL pointer is always undefined behavior, and compilers are allowed (though presumably very unlikely to on embededd platforms) to make transformations based on that assumption, such as the following:
C compilers are allowed to optimize away the null check and s
Re:TFA does a poor job of defining what's happenin (Score:5, Informative)
An example of "unstable code":
char *a = malloc(sizeof(char));
*a = 5;
char *b = realloc(a, sizeof(char));
*b = 2;
if (a == b && *a != *b)
{
launchMissiles();
}
A cursory glance at this code suggests missiles will not be launched. With gcc, that's probably true at the moment. With clang, as I understand it, this is not true -missiles will be launched. The reason for this is that the spec says that the first argument of realloc becomes invalid after the call, therefore any use of that pointer has undefined behaviour. Clang takes advantage of this, and defines the behaviour of this to be that *a will not change after that point. Therefore it optimises if (a == b && *a != *b) into if (a == b && 5 != *b). This clearly then passes, and missiles get launched.
The truth here is that your compiler is not compromising application security – the code that relies on undefined behaviours is.
Re:TFA does a poor job of defining what's happenin (Score:5, Informative)
Another, more common example of code optimizations causing security problems is this pattern:
int a = [some value obtained externally];
// integer overflow occurred ...
int b = a + 2;
if (b < a) {
}
The C spec says that signed integer overflow is undefined. If a compiler does no optimization, this works. However, it is technically legal for the compiler to rightfully conclude that two more than any number is always larger than that number, and optimize out the entire "if" statement and everything inside it.
For proper safety, you must write this as:
int a = [some value obtained externally];
// integer overflow will occur ...
if (INT_MAX - a < 2) {
}
int b = a + 2;
Re:TFA does a poor job of defining what's happenin (Score:5, Interesting)
That reminds me of this gem:Overflow in sorting algorithms [blogspot.com]
That little bug just sat around for a few decades before anyone noticed it.
Quick summary: (low + high) / 2
May have an overflow which is undefined behavior. Really every time we add ints it's possible. Just usually our values don't pass the MAX.
Re: (Score:3)
I like this one, because it shows a very common weakness in high level languages.
In most machine languages, getting the average of two unsigned numbers up to UINT_MAX is absolutely trivial -- add the two, then shift right including the carry. The average of two signed numbers rounding to zero is a little more difficult (x86 makes it harder than it should be by not setting flags in a convenient manner), but still a few instructions.
In C? Assuming low and high are unsigned
(low
Re: (Score:3)
Re: (Score:2, Informative)
The first mistake was using signed integers. unsigned integers always have well-defined overflow (modulo semantics), which means it's easier to construct safe conditionals
Not in C and C++ they don't. The compiler is allowed to perform that optimization with either signed or unsigned integers.
Re: (Score:2)
Under C99 all machines must be both 2s-compliment and have 8-bit bytes. IIRC both fall out from inttypes.h. Word is this wasn't intentional, but it had been so long since anyone actually used other architectures that no one noticed that implication.
Re: (Score:2)
This doesn't sound right to me. The intX_t types, if present, have to be more 2s-complimenty, but they aren't really required to be present, as I recall.
These bugs exist even *without* signed integers! (Score:5, Interesting)
The first mistake was using signed integers.
The problem is C's promotion rules. In C, when promoting integers to the next size up, typically to the minimum of "int", the rule is to use signed integers if the source type fits, even if the source type is unsigned. This can cause code that seems to use unsigned integers everywhere break because C says signed integer overflow is undefined. Take the following code, for example, which I saw on a blog recently:
uint64_t MultiplyWords(uint16_t x, uint16_y)
{
uint32_t product = x * y;
return product;
}
MultiplyWords(0xFFFF, 0xFFFF) on GCC for x86-64 was returning 0xFFFFFFFFFFFE0001, and yet this is not a compiler bug. From the promotion rules, uint16_t (unsigned short) gets promoted to int, because unsigned short fits in int completely without loss or overflow. So the multiplication became ((int) 0xFFFF) * ((int) 0xFFFF). That multiplication overflows in a signed sense, an undefined operation. The compiler can do whatever it feels like - including generate code that crashes if it wants.
GCC in this case assumes that overflow cannot happen, so therefore x * y is positive (when it's really not at runtime). This means the uint32_t cast does nothing, so is omitted by the optimizer. Now, the code generator sees an int cast to uint64_t, which means sign extension. The optimizer this time isn't smart enough to know again that it's positive and therefore can ignore sign extension and use "mov eax, ecx" to clear the high 32 bits, so it emits a "cqo" opcode to do the sign extension.
So no, avoiding signed integers does not always save you.
Re:These bugs exist even *without* signed integers (Score:5, Interesting)
The problem is C's promotion rules. In C, when promoting integers to the next size up, typically to the minimum of "int", the rule is to use signed integers if the source type fits, even if the source type is unsigned.
I know. C's handling of integer overflow is "undefined". In Pascal, integer overflow was a detected error. DEC VAX computers could be set to raise a hardware exception on integer overflow, and about thirty years ago, I rebuilt the UNIX command line tools with that checking enabled. Most of them broke.
In the first release of 4.3BSD, TCP would fail to work with non-BSD systems during alternate 4-hour periods. The sequence number arithmetic had been botched due to incorrect casts involving signed and unsigned integers. I found that bug. It wasn't fun.
C's casual attitude towards integer overflow is why today's machines don't have the hardware to interrupt on it. Ada and Java do overflow checks, but the predominance of C sloppyness influenced hardware design too much.
I once wrote a paper, "Type Integer Considered Harmful" on this topic. One of my points was that unsigned arithmetic should not "wrap around" by default. If you want modular arithmetic, you should write something like n = (n +1) % 65536;. The compiler can optimize that into machine instructions that exploit word lengths when the hardware allows, and you'll get the same result on all platforms.
Re: (Score:2)
Hmm I seem to have messed up a few >s and <s... That's my fault, 0: For not giving a fuck -- It's futile to try deconverting a zealot; and 1: it's 2013 and we're still escaping HTML manually?
Truly, the whole computing world is shit strung together with bubble gum and twine. I mean, really... No isolation for code and data pointers or sacrificing a register for offset / segmentation and not giving us a new offset register so we could ACTUALLY do the heap code pointer protections.
How fucking
Re: (Score:3)
Re: (Score:2)
If the runtime moved memory around during a realloc, this code wouldn't work. However, you'd never notice if you use the same runtime all the time. This is why it's a good thing to compile/target different platforms and compilers, and to do a -Wall (or the equivalent) at every optimization level. You have to do it at every optimization level because some compilers only do checks like this during their optimization phase (gcc?).
This type of thing wouldn't get caught by any automated tools when I was doing C
Re: (Score:2)
The behaviour is also undefined if realloc returns NULL. Also, sizeof(char) is 1 by definition.
Re:TFA does a poor job of defining what's happenin (Score:5, Funny)
No, the compiler is allowed to to anything it damn well pleases wherever the standard calls behaviou "undefined". One of my favorite quotes ever from a standards discussion:
When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose
Nasal demons can cause code instability.
Re: (Score:3)
If I tell the compiler to give me warnings, it detects a code whose behavior is undefined in the standard, but then fails to issue a warning then the compiler is broken. If it goes on to make a fancy assumption about the undefined behavior instead of letting it fall through to runtime as written then it's doubly broken.
Re: (Score:2)
There's nothing in the standard about "warnings", though most compilers are good about it when it comes to common problems. But even with a warning, optimizer's gonna optimize.
OK, before somebody else points it out... (Score:5, Interesting)
My statement is contradictory. I recommended a course of action for undefined behavior, while maintaining that Clang is wrong for documenting a course of action for undefined behavior.
My understanding of "undefined behavior" in the C spec is that it means "anything can happen and the programmer shouldn't rely on what the compiler currently does". Of course, in the real world *something* must happen. If a 3rd party documents what that something is, the compiler is still compliant. It's the programmer's fault for relying on it.
OTOH, if the behavior was "implementation defined" then the compiler authors can define it. If they change their definition from one rev to another without documenting the change, then it's the compiler author's fault for not documenting it.
In other words:
undefined -- programmer's fault for relying on it.
implemenation defined -- compiler's fault for not documenting it.
Re: (Score:2)
I was already responding, but yes, your summary sounds pretty much perfect.
Re: (Score:3)
There are actually 3 categories:
Re: (Score:3)
>a == b is not a use of the argument that has been invalidated
Yes it is. Evaluating the expression "a" causes undefined behaviour if "a" is
indeterminate. "a" is considered to no longer have a value, any attempt to
refer to its value causes UB. (It has the same status as a variable that has
been defined but not initialized, i.e. "int a;"
The only thing that can be done with "a" thereafter is to assign a new value to it .. can't think of any other exceptions)
(or take its address, or do "sizeof a"
Re:TFA does a poor job of defining what's happenin (Score:5, Informative)
What is "unstable code" and how can a compiler leave it out?
The article is actually using that as an abbreviation for what they're calling "optimization-unstable code", or code that is included at some specified compiler optimization levels, but discarded at higher levels. Basically they think it's unstable due to being included or not randomly, not because the code itself necessarily results in random behaviour.
Re:TFA does a poor job of defining what's happenin (Score:5, Informative)
The TFA links to the actual paper. Maybe you should read that.
Towards Optimization-Safe Systems:Analyzing the Impact of Undefined Behavior [mit.edu]
Re: (Score:2)
I.e. by the time execution reaches the if statement one of the two is true:
tun != null && sk == {something valid} -or-
tun == nul && sk == {undefined}
sk being undefined is possible but that undefined-ness can't be used as a way to infer tun != null--the only thing that causes it is tun == null! It's illogical for the compiler to do what you say and remove the if check. The
Re:TFA does a poor job of defining what's happenin (Score:5, Insightful)
>The dereference is undefined, and therefore
Stop right here. Once undefined behaviour occurs, "all bets are off" as they say; the remaining code may have any behaviour whatsoever. C works like this on purpose , and it's something I agree with. It means the compiler doesn't have to insert screeds of extra checks , both at compile-time and run-time.
There are plenty of other languages you can use if you want a different language definition :)
Re:TFA does a poor job of defining what's happenin (Score:5, Informative)
"What every C programmer should know about undefined behaviour" (part 3 [llvm.org], see links for first 2 parts).
For example, overflows of unsigned values is undefined behaviour in the C standard. Compilers can make decisions like using an instruction that traps on overflow if it would execute faster, or if that is the only operator available. Since overflowing might trap, and thus cause undefined behaviour, the compiler may assume that the programmer didn't intend for that to ever happen. Therefore this test will always evaluate to true, this code block is dead and can be eliminated.
This is why there are a number of compilation optimisations that gcc can perform, but which are disabled when building the linux kernel. With those optimisations, almost every memory address overflow test would be eliminated.
Re: (Score:2)
For example, overflows of unsigned values is undefined behaviour in the C standard.
I'm glad I didn't know that when I used to play with software 3d engines back in the 90s. 16-bit unsigned integer "wrap around" was what made my textures tile. I do seem to vaguely recall that there was a compiler flag for disabling integer traps and that I disabled it. It was Microsoft's C compiler, and it's been a loooooong time.
OK, I'm looking through the options on the 2005 free Visual Studio... I can find a flag to
Re: (Score:2)
Overflows of unsigned values are well-defined in C (they wrap). (Technically the standard says unsigned values can't overflow because they're wrapped)
Overflows of signed values are undefined.
Re: (Score:2)
OK, that explains why I've been getting away with assuming they wrap since the Clinton administration. I don't know if anybody ever explained it to me in C terms. I always assumed that behavior was baked in at the CPU level, and just percolated up to C. I never felt inclined to do any "bit twiddling" with int or even fixed-width signed integers because on an intuitive level it "felt wrong". What's that four-letter personality type thing? I'm pretty sure I had the I for "intutive" there...
Re: (Score:2)
Myers-Briggs test, and it's 'N' for intuitive. :)
Re: (Score:2)
"Overflows of unsigned values" is NOT undefined. You can assign out-of-range values to unsigned types, and also perform arithmetic operations which exceed the bounds of the type; and the value is adjusted using modular arithmetic.
Some would be facetious and say that "unsigned types cannot overflow", meaning that they always have well-defined behaviour on operations that would generate an out-of-range value, but that's just an issue of pedantry with English.
Re: (Score:2)
I'd rather set -Wall and get a warning.
I see your -Wall, and raise you a -Werror -pedantic
Re: (Score:2)
I once had some code that confused me when the compiler optimized some stuff out.
I had a macro that expanded to a parenthesized expression with several sub-expressions separated by commas that used a temp variable, e.g.:
#define m(a) (tmp = a, f(tmp) + g(tmp))
because the argument (a) could be an expression with side effects.
Now, I knew that the order of evaluation of function arguments wasn't defined, but I never read that as meaning that a compiler could optimize away parts of a function call such as: x(m(
Re: (Score:2)
I think you must be mis-remembering the details slightly. The comma operator is a sequence-point, so "tmp" must be assigned the value of "a", and f() and g() must both be called with a value that is the value of "a" converted to the type of "tmp". The two functions can be called in either order though (or in parallel) but there is no issue there.
Of course, the compiler can do anything it likes so long as the program's output is equivalent to what I just described. So, for example, it might not allocate a
Re: (Score:2)
>If my C code contains *foo=2, the compiler can't just leave that out
Well, it could if the program produces no further output before exiting, or if "foo" is unassigned.
Re: (Score:3)
, fucked up computer languages allow "undefined code", ie. C / C++.
Every language has some undefined behavior (and there are libraries with undefined behavior in every language), except maybe ADA.
Java leaves a wide area undefined when it comes to multi-threaded code.
Python has the same, plus it inherits some undefined behaviors from C.
C/C++ leaves a wide are undefined to support oddball system architectures. For example, if you have some memory that only can store floating point numbers, and some general-purpose memory, the address ranges might overlap - that's why pointe
Re: (Score:2)
Signed integer overflow (props to some people elsewhere in the thread that taught me that this isn't true for unsigned!)
Writing or reading past the end or beginning of an array
Dereferencing a NULL pointer
Accessing an object of one type via a pointer of another type (violating the strict-aliasing rules)
All of these are exactly what I was talking about - different needs for different architectures. I've coded on a platform where writing to 0 was legal, and did something bad, unless you did it on purpose No fun at all, but possible to code for.
Accessing memory at an address that has been free()ed or deleted
Calling several STL algorithms with iterator pairs that don't form a valid range, e.g. copy(vec1.begin(), vec2.begin(), vec1.end()) (I think I ordered those right)
These are important for library optimization. Without the optimization they allow, people would have written their own, faster libraries and that would have sucked far worse.
Assigning the same scalar value twice without an intervening sequence point (e.g. i = i++; not only doesn't have a well-defined evaluation order but also provokes undefined behavior entirely
I never did understand what they gained from that one, but the examples I've seen of the se
Re: (Score:2)
I think the compiler would be violating sequence points [wikipedia.org] if it moved the division up.
However, I see your point with the for-loop and have experienced it first hand when I wanted to see how fast such a loop would run. I had put some stupid addition or something in there, and the sneaky compiler went ahead and optimized my loop into oblivion. I had to put a function call in the loop to make it generate loop code.
After reading over responses to my original post, and to other posts around here I've come to th
Inflammatory Subject (Score:5, Informative)
Since C/C++ is fairly liberal about allowing undefined behavior
No, it's not. The language forbids undefined behavior. If your program invokes undefined behavior, it is no longer well-formed C or C++.
Re: (Score:3)
Re: (Score:2)
That's nice. But when a language invites such things, that *is* a flaw in the language. I basically distrust pointers, but especially any pointers on which the user does arithmetic. Some people think that's a snazzy way to move through an array. I consider it recklessly dangerous stupidity, which is leaving you wide open to an undetected error with a simple typo.
Re: (Score:2)
Do compilers really remove this? (Score:4, Interesting)
I haven't heard of any compiler that removes code just because it contains undefined behavior. All compilers I know of leave it in, and whether it misbehaves at run-time or not is... well, undefined. It may work just fine, eg. dereferencing a null pointer may just give you a block of zeroed-out read-only memory and what happens next depends on what you try to do with the dereferenced object. It may immediately crash with a memory access exception. Or it may cause all mounted filesystems to wipe and reformat themselves. But the code's still in the executable. I know compilers remove code that they've determined can't be executed, or where they've determined that the end state doesn't depend on the execution of the code, and that can cause program malfunctions (or sometimes cause programs to fail to malfunction, eg. an infinite loop in the code that didn't go into an infinite loop when the program ran because the compiler'd determined the code had no side-effects so it elided the entire loop).
I'd also note that I don't know any software developers who use the term "unstable code" as a technical term. That's a term used for plain old buggy code that doesn't behave consistently. And compilers are just fine with that kind of code, otherwise I wouldn't spend so much time tracking down and eradicating those bugs.
Re: (Score:2)
'I haven't heard of any compiler that removes code just because it contains undefined behavior.'
Then your code may not be doing what you think it is.
GCC, Clang, acc, armcc, icc, msvc, open64, pathcc, suncc, ti, windriver, xlc all do this.
Click on the PDF, and scroll to page 4 for a nice table of optimisations vs compiler and optimisation level.
_All_ modern compilers do this as part of optimisation.
GCC 4.2.1 for example, with -o0 (least optimisation) will eliminate if(p+100p)
C however says that an overflowed
Yes compilers really do this (Score:3, Informative)
Yes it leads to real bugs - Brad Spengler uncovered one of these issues in the Linux kernel in 2009 [lwn.net] and it led to the kernel using the -fno-delete-null-pointer-checks gcc flag to disable the spec correct "optimisation".
Re: (Score:2)
Example: Some people assume that if you add to a very large integer value, then eventually it will wrap around and produce a negative value. Which is what happens on many non-optimising compilers. So if you ask yourself "will adding i + 100 overflow?" you might check "if (i + 100
But integer overflow is undefined behaviour. The compiler assumes that your code d
Re: (Score:2)
True, but then if integer overflow is undefined behavior then I can't assume that the test "i + 100 < i" will return true in the case of overflow because I'm invoking undefined behavior. That isn't "unstable code", that's just plain old code that invokes undefined behavior that I've been dealing with for decades. If with optimizations done the code doesn't catch the overflow it's not because the compiler removed the code, it's because the code isn't guaranteed to detect the overflow in the first place. N
Re: (Score:2)
Clang includes a number of compilation flags [llvm.org] that can be used to make sure, or at least as sure as it can, that your code never hits any undefined behaviour at run time.
But normally, yes the compiler may change the behaviour of your application if you are depending on undefined behaviour.
Re: (Score:3)
gcc's been doing this for ages. We had a new compiler "break" the ARM kernel once. Turns out that something had a test for whether a pointer was null or not after a dereference of that pointer, and gcc threw out the test because it couldn't possibly apply.
Re: (Score:2)
You can verify these things yourself with GCC (the paper sites GCC as producing this code) and examining the output assembly code. I haven't compiled the specific example in the MIT paper but I remember a similar output from GCC. This is indeed valid in a conforming compiler, and while this specific case is relatively "obviously" dangerous there's a bunch of things that generally do speed up code that can cause subtle dangers in an almost-correct codebase.
But note that a precondition for this specific exa
-Wall (Score:3, Insightful)
If I set -Wall and the compiler fails to warn me that it optimized out a piece of my code then the compiler is wrong. Period. Full stop.
I don't care what "unstable" justification its authors gleaned from the standard, don't mess with my code without telling me you did so.
Re: (Score:2)
I don't care what "unstable" justification its authors gleaned from the standard, don't mess with my code without telling me you did so.
That's not what's happening..... they are talking about unstable optimizations; as in..... optimizations that aren't predictable, and while they don't change the semantics of the code according to the programming language ---- the optimization may affect what happens, if the code contains an error or operation that is runtime-undefined, such as a buffer overflow co
Re: (Score:2)
One of the examples from the paper was this snippet from the Linux kernel:
struct sock *sk = tun->sk;
if (!tun) return POLLERR;
gcc's optimizer deleted "if (!tun) return POLLERR;" because *sk=tun->sk implies that tun!=NULL.
Okay, I buy that. But if gcc did so without a warning with -Wall set then it gcc is broken. The author obviously expects it to be possible for tun==NULL, so if gcc decides it can't be that's a warning! Duh!
Re: (Score:3, Insightful)
If the compiler finds two constants it can combine then I've usually made a mistake in my code...
Or it inlined a function for you. Or you indexed at a constant index (perhaps 0) into a global array. Or any number of other things that can arise naturally and implicitly.
The compiler has a setting where it doesn't "mess with your code" -- it's called -O0.
Re: (Score:2)
Yeah, that's helpful.
Understand: I want the compiler to optimize my code. I don't want it to drop sections of my code. If it thinks it can drop a section of my code entirely, or that a conditional can have only one result, that's almost certainly a bug and I want to know about it. After all -- if *I* thought the conditional could have only one result, I wouldn't have bothered checking it!
Re: (Score:2)
If it thinks it can drop a section of my code entirely, or that a conditional can have only one result, that's almost certainly a bug and I want to know about it. After all -- if *I* thought the conditional could have only one result, I wouldn't have bothered checking it!
Right, you want optimization turned off. I check compile-time constants in conditionals, for example, because they're merely compile-time, and might be changed in a different build.
When an optimized sees "if (0 == 1)" it's going to remove the block. I take serious advantage of that, by putting tons of null checks in inline code. If the compiler can prove a pointer can't be null, it drops the check. That way I can code checking the same pointer for null 50 times in a function (because every library call c
Re: (Score:3)
OK, sounds like a feature request that some compiler vendor might take you up on. But eliminating the conditional is so common that most people wouldn't want that spew. Would you also want a warning that "x *= 8" was optimized into a shift left instruction on one platform, and three add instruction on another? That "(x > 30)" was optimized into a rotate instruction some platform? Heck, half your lines of code won't have 1-for-1 mappings between C operators and opcodes in the object - why the focus on
Re: (Score:3)
If I've set -Wall, I want a warning about "*a=1 is useless code." If the compiler optimizes it away without that warning, I'm going to cry about it sooner or later because there's a bug in my code. If I had meant *a=(*b)+1 I would have written it that way.
Re: (Score:2)
No you don't want that warning, because any serious coding shop sets -Wall and treats warning as errors, so you'd just #pragma it off and still never see it.
You're free to turn optimization off, however.
Re: (Score:3)
If a compiler finds itself able to remove my if statement then either it's wrong or far more likely I made a mistake.
You do realize modern optimized object code lacks any straightforward relationship to the source? It can be quite a puzzle sometimes when debugging through the binary. The way instruction pipelining works makes good object code look quite odd sometimes. The instructions corresponding to one line of source might be scattered and mixed with the object from the next 20 lines, depending on what different parts of the CPU are going to be busy doing, and when the result will be needed.
Why would you want your o
Re: (Score:2)
If the compiler decides it can delete a conditional because it's always true or always false, I most certainly do want a warning!
That section of code has a bug: either I wrote the conditional wrong or I typo-ed something like using = instead of ==. Either way I want to know about it so I can fix my code.
Re: (Score:3)
The specific case where you literally wrote exactly that snippet is warnable and is obviously incorrect, and I agree that case could be a warning, but that doesn't lead to your general conclusion at all since it's just a trivial case.
That null-check could be inlined code, or code in a macro, both of which can also appear in contexts which truly need the null check. In neither case was the if (!tun) in the original source code, so first off it's hard to even emit a sensible warning, and secondly there's no
Really small EXE mystery solved (Score:5, Funny)
No wonder my app came out with 0 bytes.
PC Lint anyone? (Score:4, Informative)
Back in the day when I was doing C++ work, I used a product called PC Lint (http://www.gimpel.com/html/pcl.htm) that did basically the same thing STACK does. Static Analysis of code to find errors such as referencing NULL pointers, buffer over flows, etc... Maybe they should teach History at MIT first...
Re:PC Lint anyone? (Score:4, Insightful)
Don't worry, the authors know what they're doing.
Just because PC Lint could find a small number of potential bugs doesn't mean it's a solved problem by any means. Program analysis is still pretty crappy in general, and they made another improvement, just like tons of people before them, PC Lint before them, and tons of people before PC Lint.
IBM had a tool to do this for a long time already (Score:4, Interesting)
It's a pretty cool critter, but I don't know if they actually sell it as a product. It might be something that they only use internally:
http://www.research.ibm.com/da/beam.html [ibm.com]
http://www.research.ibm.com/da/publications/beam_data_flow.pdf [ibm.com]
Fix the C standard to not be so silly (Score:2, Insightful)
The C standard needs to meet with some realities to fix this issue. The C committee wants their language to be usable on the most esoteric of architectures, and this is the result.
The reason that the result of signed integer overflow and underflow are not defined is because the C standard does not require that the machine be two's complement. Same for 1 31 and the negative of INT_MIN being undefined. When was the last time that you used a machine whose integer format was one's complement?
Here are the th
Re:Fix the C standard to not be so silly (Score:4, Insightful)
Pretty sure the embedded systems guys wouldn't be super supportive of this, and they're by far the largest market for C.
And I just don't think these are big sources of trouble most of the time. If people would just go read Spencer's 10 Commandments for C Programmers, this would be pretty much solved.
Re: (Score:2)
* Fixation of two's complement as the integer format.
Are you trying to make C less portable, or what?
Not all platforms work exactly the same, and these additional constraints on datatypes would be a problem on platforms, where: well two's complement is not the signed integer format.
Of course you're free to define your own augmented rules on top of C, as long as they're not the formal language standard --- and if you write compilers, you're free to constrain yourself into making your impleme
Re: (Score:2)
* Fixation of two's complement as the integer format.
Are you trying to make C less portable, or what?
The "broken" code is already nonportable to non-two's-complement machines, and much of this code is things critical to the computing and device world as a whole, such as the Linux kernel.
The paper gives examples (Score:5, Informative)
The article doesn't summarize this very well, but the paper (second link) provides a couple examples. First up:
They then give another example, this time from the Linux kernel:
The basic issue here is that optimizers are making aggressive inferences from the code based on the assumption of standards-compliance. Programmers, meanwhile, are writing code that sometimes violates the C standard, particularly in corner cases. Many of these seem to be attempts at machine-specific optimization, such as this "clever" trick from Postgres for checking whether an integer is the most negative number possible:
The remainder of the paper goes into the gory Comp Sci details and discusses their model for detecting unstable code, which they implemented in LLVM. Of particular interest is the table on page 9, which lists the number of unstable code fragments found in a variety of software packages, including exciting ones like Kerberos.
Re: (Score:2)
While this check appears to work on a flat address space, it fails on a segmented architecture.
It may not even work on a flat address space, if "buf"'s allocated block is right at the end of the addressable space.
Meanwhile, THEIR code is sketchy (Score:4, Funny)
Checked out their git repo and did a build. They have a couple sketchy-looking warnings in their own code. A reference to an undefined variable; storing a 35-bit value in a 32-bit variable...
lglib.c:6896:7: warning: variable 'res' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
lglib.c:6967:10: note: uninitialized use occurs here
plingeling.c:456:17: warning: signed shift result (0x300000000) requires 35 bits to represent, but 'int' only has 32 bits [-Wshift-overflow]
Re: (Score:3)
I probably got in early enough to grab the code before they go slashdotted. Looks like it's also on github, here:
https://github.com/xiw/stack [github.com]
Know your C (Score:4, Informative)
It's time (Score:3)
It really should be time that 99.9% of the code written ought not to be in languages that have undefined behaviour. It's time we all use languages which are fully defined.
Having said that, if something in code is undefined, and the compiler knows it, then it should generate an error. Very easily solved. If this STACK program is so clever, it should be in the compiler, and it should be an error to do something undefined.
Headline (Score:2)
Re:News flash (Score:5, Informative)
I would also like to understand what's the definition of "unstable code".
Re:News flash (Score:5, Funny)
Re:News flash (Score:5, Funny)
Re: (Score:2)
Didn't RTFA because this is /., but I'd guess that it's code that works now but is fragile under a change of compiler, compiler version, optimization level, or platform.
Re:News flash (Score:5, Informative)
Didn't RTFA because this is /., but I'd guess that it's code that works now but is fragile under a change of compiler, compiler version, optimization level, or platform.
Yes, you didn't RTFA, because your definition actually makes sense. TFA defines "unstable code" as code with undefined behavior. TFA also claims that many compilers simply DELETE such code. I have never seen a compiler that does that, and I seriously doubt if is really common. Does anyone know of a single compiler that does this? Or is TFA just completely full of crap (as I strongly suspect)?
Re:News flash (Score:5, Informative)
You probably haven't used any desktop compilers.
Just a sampling:
Re:News flash (Score:5, Interesting)
That is not "unstable" or "undefined" code. There is already a word for it: dead code. In addition, any programmer worth his/her salt will make sure to define things like that as "volatile", i.e. tell the compiler that they might be accessed at any time from place the complier does not see. Which is exactly the security problem here. Don't blame compilers for programmer incompetence....
Re: (Score:3)
I'm a bit depressed to find a /.er who's never seen GCC :-P
I once wrote an overflow check wrong -- I tried to write an `if' that would check whether the preceding operation on signed integers had overflowed. Overflow on signed integers is undefined behavior, so once it happens, it is legal for the program to do anything. "Anything" includes updating the variable with the overflowed value and then skipping the condition
Re:News flash (Score:4, Informative)
I have never seen a compiler that does that, and I seriously doubt if is really common. Does anyone know of a single compiler that does this?
The only compilers I know of that definitely do this are GCC, LLVM, ICC, Open64, ARMCC, and XLC, but others probably do too. Compilers use undefined behaviour to propagate unreachable state and aggressively trim code paths. There's a fun case in ARM's compiler, where you write something like this:
The entire loop is optimised away to an infinite loop. Why? Because accesses to array elements after the end of the array are undefined. This means that, when you write x[i] then either i is in the range 0-4 (inclusive), or you are hitting undefined behaviour. Because the compiler can do anything it wants in cases of undefined behaviour, it is free to assume that they never occur. Therefore, it assumes that, at the end of the loop, i is always less than 5. Therefore, i++ is always less than 10, and therefore the loop will never terminate. Therefore, since the body of the loop has no side effects, it can be elided. Therefore, the declarations of x and y are never read from in anything with side effects and so can be elided. Therefore, the entire function becomes a single branch instruction that just jumps back to itself.
If your code relies on undefined behaviour, then it's broken. A compiler is entirely free to do whatever it wants in the cases where the behaviour is undefined. Checking for undefined behaviour statically is very hard, however (consider trying to check for correct use of the restrict keyword - you need to do accurate alias analysis on the entire program) and so compilers won't warn you in all cases. Often, the undefined behaviour is only apparent after inlining, at which point it's difficult to tell what the source of the problem was.
Re:News flash (Score:4, Interesting)
While what you say is true, I think it's not what they mean. Instead what they mean is compilers taking advantage of undefined behaviour you didn't notice. The compiler is allowed to assume that undefined behaviour never happens, and optimize accordingly. The important point is that this can even affect code before the undefined behaviour would occur. For example, consider the following code, where undefined() is some code that causes undefined behaviour:
Now if a>4, the code inevitably runs into undefined behaviour, and therefore it may assume that a is not larger than 4 right from the start. Therefore it is allowed to compile the complete block to simply
Note that even the assert doesn't help because the compiler "knows" it cannot trigger anyway, and therefore optimizes it out.
I think it is not hard to imagine how this can lead to security problems.
Another nice example (which I read on the gcc mailing list quite some time ago; not an exact quote though):
Now if strcmp returns anything but 0, the code inevitably runs into undefined behaviour, therefore the compiler is allowed to assume that never happens, and therefore is allowed to optimize the code to simply
So there goes your password security.
Re:News flash (Score:4, Interesting)
In that vein, I tried:
while(1) {
bar=bar++;
if(bar > 3) {
printf("bar = %d\n", bar);
break;
}
}
Under gcc (trying -O0 to -O3 and -Os), this code printed "bar = 4". Compiling the same code with clang resulted in an infinite loop.
Re: (Score:3)
I would also like to understand what's the definition of "unstable code".
Unstable code is code such that, when you make an arbitrarily small change, you end up rewriting the entire thing.
Stable code, by contrast, is code such that when you make an arbitrarily small change, the code ends up being restored to its original state, or perhaps engaging in a bounded oscillation, where you and another coder keep changing it back and forth with every release.
Unstable Code: int n = x / 0; (Score:2)
According to the article "unstable code" is anything with undefined behavior according to the C++ standard. This could be as simple as an integer overflow or divide by zero which in debug or "zero optimization" mode would always cause an error, but which in an optimized release may simply be removed.
Re: (Score:3)
I'm more interested in how Linus is going to respond to a bunch of C++ programmers finding 32 bugs in his kernel.
Re: (Score:2)
The MIT article also incorrectly refers to Java. Which isn't compile-time optimized.
Re: (Score:3)
Do you think most-all exploits are down to the defective x86 segmented memory architecture.
I think those who coded for the SNES or Apple IIGS in C would disagree with blaming the x86 exclusively =)
Re: (Score:3)
"But in our enthusiasm, we could not resist a radical overhaul of the system, in which all of its major weaknesses have been exposed, analyzed, and replaced with new weaknesses".
Bruce Leverett, Register Allocation in Optimizing Compilers
Re: (Score:2)
Compilers ought to have switches that deliberately branch to the error cases they're trying to optimize away. Getting rid of a divide by zero? Force the error instead so it gets attention.
Why? Isn't that that job of the programmer nor the actual compiler.
Sure you can produce a program that has a divide by zero event and it can compile without errors, but when you run the binary you would get (C example): "Floating point exception (core dumped)". Most programmers upon seeing this should realise they have stuffed up and should correct their code accordingly. In fact any programmer should always have conditionals to test any input data to make sure that data falls within specified bounds.
Re: (Score:2)
On some machines, dividing by 0 gives 0. Volatile in C has nothing to do with multi-threaded code (it's an abuse of the standard that all modern compiler vendors embrace and support). Compiler warnings are the right answer.