Microsoft Research Touts Its 'Checked C' Extension For 'Making C Safe' (microsoft.com) 181
Microsoft Research has pre-published a new paper to be presented at the IEEE Cybersecurity Development Conference 2018 describing their progress on Checked C, "an extension to C designed to support spatial safety, implemented in Clang and LLVM."
From "Checked C: Making C Safe By Extension": Checked C's design is distinguished by its focus on backward-compatibility, incremental conversion, developer control, and enabling highly performant code... Any part of a program may contain, and benefit from, checked pointers. Such pointers are binary-compatible with legacy, unchecked pointers but have explicitly annotated and enforced bounds. Code units annotated as checked regions provide guaranteed safety: The code within may not use unchecked pointers or unsafe casts that could result in spatial safety violations.
Checked C's bounds-safe interfaces provide checked types to unchecked code, which is useful for retrofitting third party and standard libraries. Together, these features permit incrementally adding safety to a legacy program, rather than making it an all-or-nothing proposition. Our implementation of Checked C as an LLVM extension enjoys good performance, with relatively low run-time and compilation overheads. It is freely available at https://github.com/Microsoft/checkedc and continues to be actively developed.
The extension is enabled as a flag passed to Clang -- the average run-time overhead introduced by adding dynamic checks was 8.6%, though in more than half of the benchmarks the overhead was less than 1%. They also note that from 2012 to 2018, buffer overruns were the leading single cause of CVEs.
Microsoft Research says they're now evaluating Checked C, formalizing a proof of its safety guarantee -- and developing a tool to semi-automatically rewrite legacy C programs.
From "Checked C: Making C Safe By Extension": Checked C's design is distinguished by its focus on backward-compatibility, incremental conversion, developer control, and enabling highly performant code... Any part of a program may contain, and benefit from, checked pointers. Such pointers are binary-compatible with legacy, unchecked pointers but have explicitly annotated and enforced bounds. Code units annotated as checked regions provide guaranteed safety: The code within may not use unchecked pointers or unsafe casts that could result in spatial safety violations.
Checked C's bounds-safe interfaces provide checked types to unchecked code, which is useful for retrofitting third party and standard libraries. Together, these features permit incrementally adding safety to a legacy program, rather than making it an all-or-nothing proposition. Our implementation of Checked C as an LLVM extension enjoys good performance, with relatively low run-time and compilation overheads. It is freely available at https://github.com/Microsoft/checkedc and continues to be actively developed.
The extension is enabled as a flag passed to Clang -- the average run-time overhead introduced by adding dynamic checks was 8.6%, though in more than half of the benchmarks the overhead was less than 1%. They also note that from 2012 to 2018, buffer overruns were the leading single cause of CVEs.
Microsoft Research says they're now evaluating Checked C, formalizing a proof of its safety guarantee -- and developing a tool to semi-automatically rewrite legacy C programs.
Funny ... (Score:4, Informative)
clang/LLVM had been developed in tandem with, practically for a project for making C code safer in the first place: SAFECode [illinois.edu].
Re: (Score:2)
Well, this isn't quite the same comment, but if the language is compatible with C, or some subset of C, couldn't you compile the "safe version", run your tests, and then, when you were satisfied, compile with standard C? Surely the answers ought to be guaranteed to be the same if there's no error.
Re: (Score:2)
You are correct.
In the case of Verified C, it's slightly different - they've proven the optimizer will generate code that is functionally identical to the source, so you can't be sure if another compiler will generate equivalent binary.
However, whether talking about Checked C, Verified C, SAFERCode or any other validation system, validated source is validated. Furthermore, once you've done any runtime testing and shown no errors occur when in operation, that should hold true for compiled code from any compi
Re: (Score:2)
I'd guess that there is a compiler switch to turn it off.
Switch back to standard C and lose runtime checks (Score:2)
Well, this isn't quite the same comment, but if the language is compatible with C, or some subset of C, couldn't you compile the "safe version", run your tests, and then, when you were satisfied, compile with standard C? Surely the answers ought to be guaranteed to be the same if there's no error.
Only for real world inputs that match your test inputs. If you compile with standard C you lose the run time checks, array bounds for example. If these check only have a 1% penalty then for many apps that might be quite acceptable.
Re: (Score:2)
https://linux.die.net/man/3/li... [die.net]
http://valgrind.org/ [valgrind.org]
Interestingly neither of these lang standing approaches (Hi Bruce) appears to be mentioned in the IEEE paper.
Peer review ain't what she used to be.
Re: (Score:2)
Re: (Score:3)
IIUC efence and valgrind don't check for references beyond array bounds, but only for references beyond allocated memory. So this is different (and less expensive) than what they're proposing.
Re: (Score:2)
Re: (Score:3)
clang/LLVM had been developed in tandem with, practically for a project for making C code safer in the first place: SAFECode [illinois.edu].
AT&T had a safe C variant called Cyclone but haven't heard anything about it in over a decade
Re: (Score:2)
The problem with all these attempts to make C safe is that they tend to break all the stuff that makes C useful. Type punning is the classic example, or pointer manipulation.
There are better languages if you need that kind of thing. For other purposes C is the only option precisely because it isn't safe.
Re: (Score:3)
How is vi the least bit related to the compiler?
I assume people understand clang is a compiler, not a text editor. Or maybe I'm assuming too much.
What about C syntax? (Score:4, Insightful)
How many errors are due to C syntax, e.g. "=" vs "=="?
At what point do we finally decide that C just wasn't the best choice for large scale long lived systems?
(And don't tell me about "experts don't make those mistakes". See, for instance https://www.researchgate.net/p... [researchgate.net] )
Re:What about C syntax? (Score:5, Informative)
How many errors are due to C syntax, e.g. "=" vs "=="?
I haven't seen that error in many many years. The compiler gives you a warning in most cases, when you look at code with that mistake it really jumps out at you, and if it somehow does get through the compile phase, rudimentary testing will catch it. You are testing both branches of your if statements, aren't you?
Re: (Score:2)
When compiling with "-Wall" and using reasonable style, basically none.
Re:What about C syntax? (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
Because its convenient sometimes
When?
everything wrong in Pascal
Like what?
Convenient (Score:2)
it's convenient for functions whose result you want to check before going further.
e.g. with file operations:
if (NULL == (in = fopen(filename, "r"))) { /* process input from in */
fprintf(stderr, "cannot open input %s\n", filename); exit(2);
}
Re: (Score:2)
Re: (Score:2)
To the extent that C was designed with the PDP-11 machine code in mind,
if (a = b)
compiles directly into
MOV B, A
BNE 101$
which is a perfectly reasonable machine language construct (or any of its variants since most instructions set the PDP-11 condition codes.
I think many C programmers (certainly most of the early ones) are always aware of the machine code being generated as they are writing C....
(perhaps you think P-code when you write Pascal, good for you!)
Re: (Score:2)
(perhaps you think P-code when you write Pascal, good for you!)
You're a little out of date. You have some reading [freepascal.org] to do [embarcadero.com].
Re: (Score:2)
You _can_ write "if ((a = b))" and the double parentheses convince the compiler that the assignment was indeed what you wanted.
Re: (Score:2)
while ((thing = next_thing_or_null_if_no_more_things())) {
do_something_with(thing);
}
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Your quiche-eating opinion isn't going to change C
Of course it isn't. That's why you're better off using a language that avoids C's problems in the first place. Maybe reread the thread.
Re: (Score:2)
if( (result = expensive_function()) > somevalue ) { do_stuff(result); }
It can be convenient and concise to do an assignment in a test, but you're a dumbass who resists knowledge and will probably double down and say it isn't valid because pascal doesn't let you and who cares if you have to do the same thing with more lines of code.
There is even a C++ extension for it now, so you can declare the score there too.
if (auto *ptr = get_maybe_ptr()) do_stuff_on_definitely_ptr(ptr);
Re: (Score:2)
result = function_that_could_return_a_null_pointer();
function_that_uses_pointer(result);
Or even:
function_that_uses_pointer(function_that_could_return_a_null_pointer());
All the "convenience" did for you was to structur
Loops, for one. (Score:2)
> Why wouldn't function_that_uses_pointer() protect itself by doing the pointer check internally?
Because
for ( x=0; x++; x That way function_that_uses_pointer() wouldn't have to worry about someone remembering to do the check
If you want to hack together quick scripts without ever thinking about the possibility of either errors occuring, or some item simply not being present, perhaps VBA, Python, or Pascal is for you. C is for systems programmers who already need to be aware that they can't just make assum
Re: (Score:2)
Because
Your post didn't work very well. There was no because.
C is for systems programmers who already need to be aware that they can't just make assumptions
Then why are there so many [archive.org] vulnerabilities in C applications [speakerdeck.com]? Seems like there are assumptions aplenty.
Re:Loops, for one. (Score:5, Informative)
Slashdot ate the because. See:
https://developers.slashdot.or... [slashdot.org]
There are many vulnerabilities in software in every language.
As it happens, I maintain a database of every CVE ever issued, and part of my job each day is to look at any significant new vulnerabilities published that day. I've learned a couple things about languages and vulnerabilities. Obviously languages that nobody ever uses aren't used in vulnerable software very much - the number of vulnerabilities tracks fairly closely with how much use a language gets. Aside from that obvious fact, there is one more:
Languages designed to be easy for beginners tend to be used by beginners. Beginners make beginner mistakes.
There is very little stupid assembly code out there. There's a lot more stupid Python. This is simply because assembly is generally used by peoppe who know WTF they are doing; Python encourages people to make software without knowing what they are doing, which means they make really bad software.
Probably the worst language I've seen in terms of security was version 4 of PHP. It was really, really dumbed down and frequently used by people who had no clue - on public web sites. The creator of PHP openly and emphatically says he had no idea how to create a good program language, and he's right. He was trying to create a simple blog system, but inflated loops, variables, and conditions, so people started using it as a general purpose programming language for the web.
You DO have to be careful with C - and C programmers generally know that, and are careful. C is designed to be fast and to be flexible, and *simple* in terms of its built-ins, not to be a safe playground for newbies.
I fear the language which may be even worse for security than PHP 4 may be Rust. It may really surprise people for me to say that, but programs written in Rust may very well have more serious vulnerabilities than any other language. Why? Because Rust hypes some very basic features to a ridiculous degree, pretending that avoiding oob access magically makes your code secure, and many Rust programmers actually believe that. By far the vast majority of vulnerabilities are logic errors like "goto fail", not buffer overruns. No language can protect you against goto fail and similar oversights.
By making Rust programmers believe that just uaing Rust makes the software secure, or even meaningfully more likely to be secure, they are lulled into a false sense of security which encourages stupid mistakes. Have you ever seen a Rust program which even I the negative conditions in its unit tests? That's one of the most basic and important things you can do in terms of security. Many Rust fanbois truly believe that using Rust is magic, so they don't even test what happens when someone enters an invalid password, or an empty password, or how about SQL injection in the password? Rust doesn't normally buffer overflow, so no need to think about security, right?
Re: (Score:2)
Slashdot ate the because
No, you just got it wrong.
There are many vulnerabilities in software in every language.
Yes, which is why it's important for the language to mitigate that as much as possible. Your mistake while doing something as trivial as a forum post demonstrates the fallibility of the programmer.
programs written in Rust may very well have more serious vulnerabilities than any other language
Prove it.
Priorities are different for Excel macros vs an OS (Score:2)
> Yes, which is why it's important for the language to mitigate that as much as possible.
What's important very much depends on what software is being written. In a typical Excel macro, sure go ahead and check the domain of the value each time it is accessed. It'll be ten times as slow as not checking, but one shouldn't expect the project manager to manually check domains in his VBA. It's good and right for VBA to "mitigate it as much as possible".
In a graphics driver, speed is top priority. It would be a
Re: (Score:2)
Not everything is a shell script.
That's right. Some things, for example, are Pascal programs, which is a better option than C. You should take the time to learn some Pascal. You'll come around to the same opinion.
Learned it 15 years ago (Score:3)
I learned Pascal 15 years ago. It's an okay language.
At the time, Pascal was competing with Visual Basic. VB won.
The world could have chosen Pascal over VB, but they chose VB. In the 1970s, Pascal competed with C. The world chose C.
Now the industry is going through a phase in which people aren't distinguishing between beginner languages that are designed to be easy vs professional, enterprise-grade tools. Legos are easy, and I good way to learn some basics. You shouldn't build your house out of Legos. The
Re: (Score:2)
I learned Pascal 15 years ago.
Dude, you didn't know how to copy one string to another in Pascal. You didn't learn it.
Re: (Score:2)
You seem to have been too busy to read before replying.
Let's try again:
--
the *implementation* of the string copy library function in C, using some conveniences including assignment returning the value. How would you write this "copy each character" in Pascal
--
Are you familiar with the difference between IMPLEMENTING a function and CALLING it? You answered with how you would CALL the function.
Are you familiar with the difference between a character and a string? Just too much in a hurry to read "character b
Re: (Score:2)
When you grow up, you can read it (Score:2)
Okay, I gave you the link, so when grow up you can read how Pascal is implementing string copy (in Pascal). "Intrinsic" in this respect just means it's included in the built-in library - it still has to be written, silly. The CPU. doesn't have a "copy a Pascal string" opcode, so someone has to write it. That would be guys like me.
Re: (Score:2)
That would be guys like me.
Good god, I hope not. Spend some time learning Pascal. Like I say, you'll come around.
Re: (Score:2)
There is very little stupid assembly code out there.
There is very little assembly code out there, period. But we've seen, going one layer down, that even the experts at Intel can screw up security.
Probably the worst language I've seen in terms of security was version 4 of PHP.
Because PHP is an awful language that was heavily inspired by Perl. Gee, it turns out when you make awful design mistakes, it impacts the number of errors! Just like C.
You DO have to be careful with C - and C programmers generally know that, and are careful.
Is that why 20% [cvedetails.com] of CVE bugs are for overflow and memory corruption?
I fear the language which may be even worse for security than PHP 4 may be Rust.
You've got to be shitting me. Rust will remove a vast swath of the errors that pops up in C all the time. Not only that, it's a diff
Slashdot ate my post! (Score:2)
Friggin Slashdot ate my post.
if ( thegamma = get_gamma() ) {
for ( y=0; y++; y < pixelheight ) {
for ( x=0; x++; x < pixelwidth ) {
do_gamma( x, y, thegamma );
}
}
}
It's kinda silly to check a million times that thegamma isn't null. Checking once is quite enough.
I kno
Re: (Score:2)
Friggin Slashdot ate my post
Whoops. Mistakes like that happening are why allowing assignment in an if test is a bad idea. Too easy to get wrong.
if ( thegamma = get_gamma() ) {
Wow, man. So the alternative is an extra line:
thegamma = get_gamma();
if (thegamma) {
Hardly seems inconvenient. You're arguing for a misfeature.
Re:Slashdot ate my post! (Score:4, Interesting)
You certainly CAN write it in two lines instead of one, sure.
You asked for an example of where it is convenient.
As I mentioned, here's the implementation of the string copy library function in C, using some conveniences including assignment returning the value. How would you write this "copy each character" in Pascal?:
while (*dest++ = *src++);
I'm going to guess that rather than one line, it'll be about fiveines. Some people prefer not to write five times as much code as needed.
Personally, I kinda like this habit to not only avoid the error but make it extremely obvious that I haven't done an assignment rather than a comparison:
if (4 == x)
By habitually putting the constant on the left side, I'd get a compile error if I accidentally typed = instead of ==.
Re: (Score:3)
How would you write this "copy each character" in Pascal?
Like this: myString := myOtherString;
C's string handling is nothing to take pride in.
By habitually putting the constant on the left side, I'd get a compile error if I accidentally typed = instead of ==.
Pascal doesn't have the problem in the first place.
Re: (Score:2)
I actually didn't write glibc, only read it (Score:2)
> Then you (clearly a C fanboi) writes code like this:
> while (*dest++ = *src++);
I actually didn't write the C library. I've written several Perl libraries; you'll find my code in Apache and Solaris, but Roland McGrath wrote glibc.
Re: (Score:3)
Please for the love of god use strncpy.
Please for the love of god NEVER use strncpy. If your buffer doesn't have enough space, it copies the bytes from source and doesn't write a trailing zero byte, so now you have a trap just waiting to spring on you. It's the worst design possible.
In addition, calling strncpy() to copy into a buffer of n bytes takes O (n). 5 bytes into a megabyte buffer sets a million bytes to 0.
Write two helper functions. One that creates a shortened, valid C string if it doesn't fit. One that is guaranteed to crash if i
Re: (Score:2)
That just makes code less readable.
Re: (Score:2)
Re: (Score:2)
>> As I mentioned, here's the implementation of the string copy library function in C
> well one, it's better to use a library function like strcpy
Perhaps you didn't notice where I said that IS strcpy? You can't do this:
char * strcpy (const char * src, char * dest) {
return strcpy();
}
Communicate in your application code (Score:2)
The communication happens when you CALL strcpy. Inside of glibc, efficiency rules.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
1) Where in memory do these strings reside?
Stack or heap. It depends on the string type you use.
What happens to the strings when the function reaches the end of it's scope?
Strings are reference counted in Pascal. You can dispose of them ahead of time or allow them to fall out of scope. The scope depends on the declaration, not the function. Like I say, you should learn some other languages.
Re: (Score:2)
Re: (Score:2)
I think you're suffering from a kind of Stockholm syndrome. C has mistreated you for so long that you've come to identify with it. C has taken so much of your self-respect that your only outlet is these pathetic rants on internet forums and you can't see that there's something better out there. Worse, C has made you believe that you don't deserve something better. This is sadly typical of all abusive relationships.
But have hope
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Pathetic.
Re: (Score:2)
Re: (Score:2)
I do not use toy-languages.
Re: (Score:2)
Re: (Score:2)
At what point do we finally decide that C just wasn't the best choice for large scale long lived systems?
We can decide that whenever you like; unless you've got a time machine handy, it doesn't really change anything now, because all of those systems are still out there and aren't going anywhere. Even if you aren't working on the C code directly, you're likely going to want to link to a C library, or run on a C-based operating system.
C, like the faint scent of urine on the subway, is there, and there's nothing anybody can do about it.
Re: (Score:2)
Experts donot make that mistake. In fact, the question makes your look rather dumb.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
As
It's funny the number of typos committed by C advocates. All these mistakes demonstrate the value of a language like Pascal over C.
Re: (Score:2)
That's not what Happened
Sure, kid. It's always Someone Else's Fault.
Re: (Score:2)
"At what point do we finally decide that C just wasn't the best choice for large scale long lived systems?"
So what choice should have been made... in 1972? (early days of Unix).
Seems like it was an excellent choice for 1972 and survive nearly 50 years (i.e. your "long lived" criterion).
Was there another choice in 1972 that was a better choice in 1972 that has shown itself to be a popular (i.e. grown in usage by leaps and bounds since 1972 or earlier) and solid choice in 2018?
MISRA Comparison? (Score:3)
Can anyone compare this to what Embedded has been doing for a while in functional safety?
https://en.wikipedia.org/wiki/... [wikipedia.org]
It's why Mathworks makes stupid money off of Polyspace Static Analyzer.
https://www.mathworks.com/prod... [mathworks.com]
https://www.mathworks.com/prod... [mathworks.com]
On top of that there's also the Barr Group's Embedded C Coding Standard.
https://barrgroup.com/Embedded... [barrgroup.com]
Re: (Score:3)
Re:MISRA Comparison? (Score:4, Insightful)
Re: (Score:3)
Agreed. And quite a few studies of MISRA say likewise.
There's probably a subset that is genuinely useful, simply because it does seem to work when selectively applied.
Bugs are not just code, some are in design (Score:3)
Re: (Score:2)
Re: (Score:2)
Regarding your question of adding coding problems into the mix, the answer is a common one. Tradeoffs. For performance reasons you may need to use C. There is no universal answer to what language to use, its a matter of best fit, and sometimes that best fit is C. I realize some might argue otherwise, but I've also noticed that many people merely argue for the language they are most famili
Misleading comparison to other languages (Score:5, Informative)
Pretty major error right in the introduction:
> Legacy programs would need to be ported wholesale to take advantage of these languages,
Not true for Rust. C libraries and applications can be ported to Rust incrementally and, in fact, some examples have already been done and shipped! See Federico's work on librsvg for example: https://people.gnome.org/~fede... [gnome.org]
Re: (Score:3, Informative)
Rust is a horrible clusterfuck of a language.
Apart from a cumbersome syntax, traits hide implementation details, so you have to look up the implementation or (worse!) take a look at the implementation of a type in order to know what it will actually do memory-wise, information you need to know in order to use the type in any realistic way. Rust also doesn't have proper classes, which together with the overly complex borrow-checker makes it essentially impossible to write glue libraries to object-oriented fo
Re: (Score:2)
Re: Microsoft lags behind better research (Score:5, Informative)
There's a difference. CompCert/Verified C is concerned with formally verifiable source code and provably correct compilation, which means pointers are bad.
CheckedC doesn't do any of the above, it is only a secure pointer system. Microsoft's Z3 handles formal verification.
Re: (Score:2)
CompCert [inria.fr] provides a formally verified subset of C and has existed since 2008
AT&T's Cyclone is a bit older than CompCert but appears to have fizzled out 12 years ago
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:3)
What's needed is not independent comparison (well, that's needed, but that's not the problem). What's needed is a license that guarantees that there's no copyright or patented code in the result. I.e., a guarantee that the generated code can be used under any license of your choice without legal danger from either Microsoft or from any company with which they have or have had a business relationship unless the source code compiled by a standard C compiler would have the same problem.
Re: (Score:2)
Ok, I would agree with that. So, license check then a benchtest. IANAL, but I can do the latter adequately even if I can only do a cursory skim for the former.
Re: (Score:2)
It's M$ not matter what you check, to pump up this quarters profits according to some dick spreadsheet, if they change what ever they choose to change and it makes it more insecure, they will change it. They have pretty much zero reliability, touting stuff, dumping it when the profitability is not there or there is greater profitability elsewhere, leaving users in the lurch, not a few times but a whole lot of times. What ever it is they are pushing today, will be different in a years time and most often wor
Re: (Score:2)
Do you know how many compilers Microsoft have written over the years? Can you name a single instance of any one of them producing code that has ever placed a user in legal trouble due to licenses, copyrights or patents?
Twenty years ago people claimed that Microsoft were going to use submarine patents to slap infringements on people who used their compilers, and yet not one single time has this happened.
Re: (Score:2)
If you ask specifically about compilers, I can only think of a couple of instances, and they didn't really end up in court. (In one case that was because Sun sued MS over J++ before the event.)
If you ask more generally about code produced by MS, there have been lots of instances where the code could only legally be used linked with certain licenses...and it wasn't always clear which ones.
Re:Hmmm. (Score:5, Interesting)
It is MS Research. MS proper ignores them routinely.
Re:Sigh (Score:4, Informative)
C is already safe
Is it? Let's have a look at a security analysis of applications written in C [archive.org] on FreeRTOS. It seems like they're riddled with flaws [speakerdeck.com]. Saying "just write better code" lacks real world perspective.
Re:Sigh (Score:4, Insightful)
I've been writing C programs for 3 decades, and I have made plenty of mistakes along the way. Occasionally because of using the wrong pointer, but most of them were simply because I got the algorithm wrong. None of these "safe" languages would have prevented the 2nd kind of error.
Re: (Score:2)
Re: (Score:2)
You don't know that. If you have more brain left to do the algorithm, or do the algorithm first in Python or on paper, you perhaps had avoided the mistake.
The problem basically is you can only shuffle 7 - 9 topics in your short term memory. That is basically your brains registers. If three of them are occupied by useless low level stuff, only the rest can be used for the algorithm.
Re: (Score:2, Insightful)
> People are so obsessed with wrangling the last ounce of performance out of application programs
You made coffee come out my nose!
Have you actually *used* a major application lately? I'd say performance is far down the list.
Re: (Score:2)