Using Redundancies to Find Errors 338
gsbarnes writes "Two Stanford researchers (Dawson Engler and Yichen Xie) have written a paper (pdf) showing that seemingly harmless redundant code is frequently a sign of not so harmless errors. Examples of redundant code: assigning a variable to itself, or dead code (code that is never reached). Some of their examples are obvious errors, some of them subtle. All are taken from a version of the Linux kernel (presumably they have already reported the bugs they found). Two interesting lessons: Apparently harmless mistakes often indicate serious troubles, so run lint and pay attention to its output. Also, in addition to its obvious practical uses, Linux provides a huge open codebase useful for researchers investigating questions about software engineering."
I have no idea what this article means ! (Score:1, Insightful)
Analysis "by file" vs "by function"? (Score:0, Insightful)
How to Avoid Mistakes? Practical Advice? (Score:5, Insightful)
What are some of the best ways to learn to avoid problems? I know that experience is useful. Trial and error is good, mentoring is good, education is good. What else can you think of? What books are useful?
Also, I wonder about usability problems. In other words, this article mainly hits on the problems of "hidden" code, not the interface. I'd like to see more about how programmers stuff interfaces with more and more useless crap, and how to avoid it. (Part of the answer is usability testing and gathering useful requirements, of course.) What do you think about this? How can we attack errors of omission and commission in interfaces?
Good Design = Tight Code (Score:5, Insightful)
lint is horrible (Score:3, Insightful)
It really is. It's a redundant holdover from ye old BSD versions. Granted, there are one or two times i've used it when -Wall -pedantic -Werror -Wfor-fuck's-sake-find-my-bug-already doesn't work, but a lot of the time it comes up with a LOT of complaints that are really unnecessary. Am i really going to have to step through tens of thousands of lines of code castind the return of every void function to (void)? Come on.
Error checking compilers (Score:1, Insightful)
I understand that there are non-C compilers out there that actually detect code errors without running a separate utility. Amazing!
Re:redundant? (Score:2, Insightful)
if (a==1)
{
[some chunk of code]
}
else
{
[same (or almost exact) chunk of code]
}
where the same code block appears multiple times in a file/class/project. By having the same block of code appear multiple times the chance of a user-generated error increases. Easily fixed by moving the repeated code into a parameterized function.
Re:How to Avoid Mistakes? Practical Advice? (Score:5, Insightful)
I think the lesson here is basically that the compiler is your friend. Turn on all the error checking you possibly can in your development environment and pay attention to every last warning.
If there is something trivial causing a warning in your code--fix it so it doesn't warn, even though it wasn't a "bug". If your compiler output is always pristine, with no warnings, then when a warning shows up if it's a bug you'll notice.
Kind of common sense if you ask me--but maybe that's just a lesson I learned the hard way.
Re:I have no idea what this article means ! (Score:5, Insightful)
"Dead" or unreachable code is almost always caused by patches or fixes to an existing codebase and it's always good to detect and get rid of it because it may point to other problems in the application (in my experience), or is simply dead wood that should be removed.
Re:lint is horrible (Score:4, Insightful)
Re:redundant? (Score:4, Insightful)
It's not, keep the code squeaky clean because cleanliness is next to godliness, it's keep the code clean so it's easy to read. Keep it clean because it's a discipline that will pay off when it's time to spot/fix the real errors in the code.
Re:I have no idea what this article means ! (Score:3, Insightful)
Re:How to Avoid Mistakes? Practical Advice? (Score:5, Insightful)
I'd put it more strongly than that. Reading between the lines of the paper, don't just fix the warning. Look around the place where the warning happened. You'll most likely find a bug.
It's also a call for compilers to generate more warnings, which can only be a good thing.
Re:Intentional redundant code (Score:5, Insightful)
Was the manager asking for lines of code/day fired too?
Re:lint is horrible (Score:4, Insightful)
Lint? Lint compains if you call a function that returns an int and you ignore the int. This is particularly irritating in the case of strcpy() and similar functions where you would normally do:
except you're supposed to do:
or...
And that's just the beginning...
Re:How to Avoid Mistakes? Practical Advice? (Score:3, Insightful)
Well, identifying it as useless crap is a good first step.
And for managers:
while(economy.isDown())
if(newInterface.isUselessCrap())
{
fireEmployee();
hireNextTelecomRefugee();
}
If you want a serious answer, one reason programs get filled with so much useless crap is because 80% of programmers program so they can collect a paycheck. They don't give a flying fuck if their code is good or not. That was a big eye-opener for me when I first got out of school. I couldn't believe how many people just didn't care.
If you are interested at all in not muddying the interface, you are most of the way there. Give it some thought, consult with your peers, and try to learn from mistakes.
Don't be afraid to refactor code every so often, because, schedule or no schedule, new requirements move the 'ideal' design away from what you drew up last month. That's (to my mind) the second largest contributor. Even good coders crumble to cost and schedule, and band-aid code that just plain needs to be rethought. In some environments, that's a fact of life. In others you will have to fight for it, but you can get code rewritten.
Re:Intentional redundant code (Score:5, Insightful)
Re:Patience! (Score:2, Insightful)
Re:How to Avoid Mistakes? Practical Advice? (Score:5, Insightful)
Don't be afraid to refactor code every so often, because, schedule or no schedule, new requirements move the 'ideal' design away from what you drew up last month. That's (to my mind) the second largest contributor. Even good coders crumble to cost and schedule, and band-aid code that just plain needs to be rethought. In some environments, that's a fact of life. In others you will have to fight for it, but you can get code rewritten.
In my experience, programming for an employer is the process of secretly introducing quality. This usually consists of debugging and refactoring on the sly while your pointy-haired boss thinks you're adding 'features'.
Is it just me, or this the way it's done most places?
heed all warnings (Score:3, Insightful)
Second, use standard idioms. For some, that may mean learning the standard idioms. These should become second nature. Programmers should express their creativity in the logic, structure, and simplicity of the code, not the non standard grammar. Standard forms allow more accurate coding and easier maintenance.
Re:Analysis "by file" vs "by function"? (Score:3, Insightful)
They are using analysis techniques to locate bugs at specific points within the parse-tree.Hence, they are locating bugs within specific functions rather than just files. As all of their examples showed. Sure, its a nice point. But it is what they are doing.
Re:lint is horrible (Score:1, Insightful)
Seeing that you used strcpy() as an example, I can certainly understand why you don't like lint. For those of us who know better than to use such dangerous functions, lint and -Wall are our annoying but useful friends.
Re:Saw his talk at FSE (Score:2, Insightful)
This is not a valid reason, I know a lot of people (including me) that would be happy to improve his research code into something useful for the community at large. I remember a paper describing a gcc extension to write semantic checks (for instance, reenable interrupts after disabling them). This program found an amazing number of bugs in the linux kernel. I really wish I could have something like that at hand!
Re:lint is horrible (Score:2, Insightful)
Where there's smoke, there's fire (Score:4, Insightful)
The paper is not about obvious code redundancy bugs, it is about subtle errors which are not as simple as just duplicate code. It is about code that *appears* to be executed but actually is not.
Go take a look at the examples and see how long it takes you to notice the different errors...now imagine have a thousand pages of code to peruse..would you catch it? Many of them probably not.
The conclusion of the paper is basically, errors cluster around errors; finding a trivial unoptimal syntactical constructions tends to point to real bugs.
Where there's smoke, there's fire.
Re:Good Design = Tight Code (Score:2, Insightful)
I'm an old school procedural programmer that is making the rocky transition to OOP programming
I agree whole-heartedly with what you say about code re-use. However I wouldn't see this as being a feature solely of OOP. Get the design right, and you can have some equally tight, highly-reusable procedural code.
Re:How to Avoid Mistakes? Practical Advice? (Score:2, Insightful)
Just use this mechanism and you can find thousands of errors in an already tested system.
What impressed me most was something like this.
if( complex statement );
do=that;
Notice the semicolon! This kind of errors are very hard to spot and they can stay in the code forever.
I will propose to use a code-checker like this in our software to improve the quality.
Re:I Hope They Didn't Get Paid (Score:5, Insightful)
Here's an example they cite from the Linux kernel: That last line assigns a variable to itself. Do you think that's what the programmer intended? Of course not. It's a bug. But no one caught it. If not for their program, maybe it would never have been caught.
You think this research is useless? Do you always write bug-free code? Maybe you should run this program on your own code and see what happens.
Error in paper (Score:4, Insightful)
/* 2.1.1/drivers/net/wan/sbni.c:sbni_ioctl */
slave = dev_get_by_name(tmpstr);
if(!slave && slave->flags & IFF_UP && dev->flags & IFF_UP))
{
return -EINVAL;
}
if (slave) {
/* BUG: !slave is impossible */
else {
return -ENOENT;
}
This is characterized in the caption as an example of "overly cautious programming style" for the redundant "else" in the "if (slave)..." statement. The text goes on to mention that these are "different error codes for essentially the same error, indicating a possibly confused programmer."
The mistake the authors make here is that the two error codes are intended to flag rather different conditions, which the caller of this function may well be set up to handle in different ways. I'm not familiar with Linux device drivers, so I'm making some guesses here about the purpose of what various components of this code are doing. dev_get_by_name appears to be looking up the name of a device in a table and returning a pointer to a structure containing information about that device. Clearly, ENOENT is intended to indicate that no device of the name supplied was found in the table, and EINVAL was supposed to indicate that a device was found, but that there was some condition that invalidated the information.
I don't think this programmer was confused, but he *was* sloppy, likely rushed, and trying to add a feature in the fewest lines possible. The probably scenario was as follows: The check for (slave) being non-NULL was in place. Then a programmer comes along to add the checks against the IFF_UP mask. (IFF_UP: I Find Fuck-UPs?) The code that's executed if "slave" is non-NULL *shouldn't* be executed if this check finds a problem, so he puts the check first. But he doesn't want to dereference a NULL pointer, so he does the reflexive C thing and places a check for the nullity of "slave" at the beginning of the logical expression. (The first term in an "&&" operation is evaluated first, and if it's false the rest of the expression is ignored.) He simply failed to notice the effect on the "else" clause following "if (slave)". Or perhaps he didn't see it; the author of the paper cut out all the code there, and the "else" could have been many lines down the page.
What the programmer should have written was:
if (slave) {
if (!(slave->flags & IFF_UP && dev->flags & IFF_UP)) {
return -EINVAL;
}
}
else {
return -ENOENT;
}
Perhaps this makes it clearer that the function was in fact trying to check for two very distinct error conditions here.
redundancy is the root of all (ok, most) evil (Score:2, Insightful)
If a programmer learns to be extremely picky about redundancy, and structures most of his code with the goal to be non-redundant, he will automatically avoid the great majority of programming problems. He will have highly maintainable code, as his code will be perfectly factored (as the XP folk would say), and he will get a very decent bottom up design without even having to think about it (perfectly factored code often entails good abstraction, and abstraction at the right level).
It isn't a panacea but I sincerely believe it is one of the most important things for any programmer. Most programmers will claim that avoiding redundancy is "obvious", but very few apply this rule consistently and thoroughly.
Dead code (Score:2, Insightful)
Is this dead code going to get removed?
No.
Why not?
Because, one, it's only an opinion that it's dead code. There could be some obscure case that no one imagined that could use it. Two, if some programmer removed it and it turned out that it was needed or the programmer screwed up the removal, the programmer would be blamed and take a lot of grief for it. If it ain't broke, don't fix it.
Now, it could be that the dead code doesn't work properly for the obscure case. But how could you tell? Do you want to write a test case for code that no one can figure out how it gets invoked?
Confirms that 80/20 works for defects, too... (Score:2, Insightful)
If you look at a CVS repository and identify those files that have high revision numbers, there's a good chance they are full of errors and need to be rewritten.
One visualization is to color code according to it's age - old code blue, and new code red - then look at the results. You will often see that the red code clusters, and there are huge regions of blue that have been stable for years. You will also see relatively small clusters of differening shades of red, as people need to keep banging on the same problematic code.
Removing dead code... (Score:5, Insightful)
Like more aphorisms, you can argue this, but my point is this - every line of code in a program is a potential bug. Every line of code requires a bit more grey matter to process, making your code just that much more difficult to understand, debug, and maintain.
So I ruthlessly remove dead code. Often, I'll see big blocks like this:
#ifdef old_way_that_doesnt_work_well
blah;
blah;
blah;
#endif
And I will summarily remove them. "But they were there for archival purposes - to show what was going on" some will say. Bullshit! If you want to say what didn't work, describe it in a comment. As for preserving the code itself - that is what CVS is for!
By stripping the code down to the minimum number of lines, it compiles faster, it checks out of and in to CVS faster, and it is easier to understand and maintain.
I will often see the following in C++ code:
void foo_bar(int unused1, int unused2)
{
unused1 = unused1;
unused2 = unused2;
}
And I will recode it thus:
void foo_bar(int , int )
{
}
That silences the "unused variable" warning, and makes it DAMN clear in the prototype that the function will never use those parameters. (True, you cannot do this in C.)
Code should be a lean, mean state machine - no excess fat. (NOTE - this does NOT me remove error checking, #assert's, good debugging code, or exception handlers).
Re:How to Avoid Mistakes? Practical Advice? (Score:3, Insightful)
It looks like it's intended for the people who program GCC, perl, kaffe, etc., as they can use this information to build better checking into their respective compilers, rather than for programmers.
Re:Good Design = Tight Code (Score:4, Insightful)
You'll have more functions and the code might be a little harder to follow for the unfamiliar, but it will be much easier to debug if there is only one function that does a particular task.
Ben
Re:Parallel programming 101 (Score:3, Insightful)
I suggest you check out Dawson Engler's resume; he has almost certainly done 10x more parallel-systems development than you have. This particular code example might be a bad one, because the analysis that supports the author's conclusion [slashdot.org] is omitted from the article, but the basic point is still valid: code that contains duplicate condition checks like those in the example is more likely to contain real bugs than less duplicative code, and the "low-hanging fruit" can be identified automatically. It's not hard at all to see how deeper analysis, different rules, or annotation could do a better job of weeding out false duplicates without compromising the tool's ability to flag legitimate areas of concern.
You're arguing about low-level implementation, when the author was trying to make a point about a high-level principle. That's the hallmark of an insecure junior programmer.
Re:lint is horrible (Score:4, Insightful)
No, it isn't. I took a legacy application that I began maintaining and used lint to eliminate hundreds of lines of code and several real never-before-detected bugs. It also encouraged me to remove dozens of implicit declarations and redundant "extern" statements in favor of real header files. The application really is better for it, and to do this work without lint would have been very very tedious. Granted, my experience is with Sun's compiler's lint, so I can't say whether other implementations are as good.
Actually, all of lint's complaints are about a potential problem. You just have to decide what is worth the time to fix.
Using lint is a deliberate process that should take several days or weeks for a large application (on the first time through). After that initial investment, using lint is still an important part of the ongoing health of the program, but it should become less and less of an effort each time.
Re:Dead code (Score:3, Insightful)