Forgot your password?
typodupeerror
Programming IT Technology

Using Redundancies to Find Errors 338

Posted by timothy
from the the-error-is-right-around-here dept.
gsbarnes writes "Two Stanford researchers (Dawson Engler and Yichen Xie) have written a paper (pdf) showing that seemingly harmless redundant code is frequently a sign of not so harmless errors. Examples of redundant code: assigning a variable to itself, or dead code (code that is never reached). Some of their examples are obvious errors, some of them subtle. All are taken from a version of the Linux kernel (presumably they have already reported the bugs they found). Two interesting lessons: Apparently harmless mistakes often indicate serious troubles, so run lint and pay attention to its output. Also, in addition to its obvious practical uses, Linux provides a huge open codebase useful for researchers investigating questions about software engineering."
This discussion has been archived. No new comments can be posted.

Using Redundancies to Find Errors

Comments Filter:
  • redundant? (Score:3, Interesting)

    by 1nv4d3r (642775) on Thursday January 23, 2003 @12:32AM (#5141236)
    To me 'redundant' implies duplication of something already there. (a=1; a=1;)

    a=a; and dead code aren't so much redundant as they are superfluous. It's still a sign of possible errors, for sure.
  • Saw his talk at FSE (Score:5, Interesting)

    by owenomalley (103963) <(omalley) (at) (apache.org)> on Thursday January 23, 2003 @12:53AM (#5141362) Homepage
    I saw Dawson's talk at FSE (Foundations of Software Engineering). He uses static flow analysis to find problems in the code (like an advanced form of pclint). The most interesting part of his tool is in the ranking of the problem reports. He has developed a couple of heuristics that sort the problems by order of importance and they supposedly do a very good job. Static analysis tools find most of their problems in rarely run code, such as error handlers. Such problems are problematic and sometimes lead to non-deterministic problems, which are extremely hard to find with standard testing and debugging. (This is especially true, when the program under consideration is a kernel.) Dawson also verifies configurations of the kernel that no one would compile, because he tries to get as many possible drivers at the same time as he can. The more code, the better the consistency checks do at finding problems.

    By making assumptions about the program and checking the consistency of the program, his tool finds lots of problems. For instance, assume there is a function named foo that takes a pointer argument. His tool will notice how many of the callers of foo treat the parameter as freed versus how many treat the parameter as unfreed. The bigger the ratio, the more likely the 'bad' callers are to represent a bug. It doesn't really matter which view is correct. If the programmer is treating the parameter inconsistently, it is very likely a bug.

    He also mentioned that counter to his expectations, the most useful part of his tool was to find 'local' bugs. By local, I mean bugs that are local to a single procedure. They are both easier for the tool to find, more likely to actually be bugs, and much easier for the programmer to verify if they are in fact bugs.

    He analyzed a couple of the 2.2.x and 2.4.x versions of the kernel and found hundreds of bugs. Some of them were fixed promptly. Others were fixed slowly. Some were fixed by removing the code (almost always a device driver) from the kernel. Others he couldn't find anyone that cared about the bug enough to fix it. He was surprised at the amount of abandonware in the Linux kernel.
    It is extremely frustrating that Dawson won't release his tool to other researchers (or even better to the open source community at large). Without letting other people run his tool (or even better modify it), his research ultimately does little good other than finding bugs in linux device drivers. *heavy sigh* Oh well, eventually someone WILL reimplement this stuff and release it to the world.

    On a snide comment, if he was a company he would no doubt have been bought by Microsoft already. Intrinsa was doing some interesting stuff with static analysis and now after they were bought a couple of years ago, their tool is only available inside of Microsoft. *sigh*
  • by Glass of Water (537481) on Thursday January 23, 2003 @12:56AM (#5141381) Journal
    I think the lesson is just what they're saying. Applying these tests on a body of code is a good way to find high-level errors, even though the tests just check for low-level mistakes.

    That seems pretty practical to me.

    The issue you raise about interfaces is only tangentially related. There, you get in to the problem (a very real problem) of confusing coding. This paper does not deal with the issue of whether the code is written well from the point of view of other programmers who need to work with it.

  • by RhettLivingston (544140) on Thursday January 23, 2003 @12:58AM (#5141393)
    I suppose they've hit my pet peeve. I've seen many simple problems turned into hideous monstrosities with many bugs by people trying to handle bugs that can't ever happen and imaginary special cases because they were never taught how to abstract a function. Perhaps it can't be taught. In 20+ years of programming, its been a very rare time when I've picked up code and not been able to cut out large chunks without replacing them.
  • by voodoo1man (594237) on Thursday January 23, 2003 @01:12AM (#5141457)
    At the risk of being modded redundant (hah!) I have to point out that a correlation between sloppy coding and errors does, in fact, exist. Many of us who write software have suspected this for a long time, and it is good to know that our hypothesis is supported by concrete research from the academic community, who seem to have finally proven that "redundancies seemed to flag confused or poor programmers who were prone to other error types."

    Hopefully, we can expect much more of such valuable breakthroughs from the academic community in the future, complete with papers full of badly formatted C code!

  • by RodgerDodger (575834) on Thursday January 23, 2003 @01:21AM (#5141502)
    It is extremely frustrating that Dawson won't release his tool to other researchers (or even better to the open source community at large).

    There's probably a bunch of reasons why he hasn't done this. The most likely one is that he's using it as a research tool, and he doesn't want someone else to beat him to the punch in his research. A second is that it's probably not really in a fit state for sharing as yet (the tool is not the goal of the research, after all).

    He's got a bunch of papers up describing how the tool works, so it can be reimplemented. Also, if he's like most academics, he'll probably talk your ear off if you ask him how it works. :)
  • Smatch (Score:4, Interesting)

    by Error27 (100234) <error27@gmaPLANCKil.com minus physicist> on Thursday January 23, 2003 @04:00AM (#5141595) Homepage Journal
    The Stanford Checker is great. I was blown away when I read their papers last year. Their checker is not released yet, so I wrote a similar checker (smatch.sf.net [sf.net]) based on their publications.

    The poster mentions Lint, but I did not have any success using Lint on the kernel sources. The source is too unusual.

    Also Lint does not look for application specific bugs. For example, in the Linux kernel you are not supposed to call copy_to_user() with spinlocks held. It took me about 10 minutes to modify one my existing scripts [sourceforge.net] to check for that in the Linux kernel last Monday. It found 16 errors. (It should have found more but I was being lazy.)

    A lot of the time, you can't tell what will be a common error until you try looking for it. One funny thing was that there were many places where people had code after a return statement. On the other hand, I didn't find even one place where '=' and '==' were backwards.

    It's fascinating stuff playing around with this stuff. I have been learning a lot about the kernel through playing around with Smatch.
  • by mrflip (11712) <flip@[ ]lip.com ['mrf' in gap]> on Thursday January 23, 2003 @04:21AM (#5141641) Homepage
    This made me immediately think of the 'redundant/unused code' conundrum in biology (Sorting through DNA [slashdot.org]). Surprisingly little of the DNA string seems to be 'active' code, where active means 'codes for a gene.' Most of it is either has no use, or uses that are not clear to us now. One pedestrian use for this 'dead' code is simply to separate the genes logically and spatially; this reduces the probability of simultaneous defects in multiple genes.

    DNA code also has high redundancy, which allows error-correcting transcription and other hacks ( see Parity Code And DNA [slashdot.org] or DNA's Error Detecting Code [slashdot.org])

    In both cases factors yielding robust DNA code are found to indicate bad digital computer code.

    flip

    (background: Ars Technica's Computational DNA primer [arstechnica.com]

  • Re:lint is horrible (Score:1, Interesting)

    by Anonymous Coward on Thursday January 23, 2003 @04:29AM (#5141667)
    Use tendra. http://www.tendra.org
  • Re:Smatch (Score:4, Interesting)

    by JohnFluxx (413620) on Thursday January 23, 2003 @04:39AM (#5141686)
    Did you get the bugs that you found fixed?
    It wasn't clear if you submitted it.

    Btw, I'm mulling with the idea of writing a write-time checker that would do a lot of this sort of stuff but as you are coding. That way your favourite editor can underline errors (like a spell checker does).
    One of the things I was most interested in doing was number-ranges. Basically if you have a for-loop that loops x between 0 and say 10, then inside that for-loop you know x=[0-10]. Then you can check if you access an array outside of those bounds.
    Do you have any idea how useful this would be? Or any ideas if it has been done, or anything?

    It is an area that really interests me, but that I have no knowledge about :)

    JohnFlux
  • by Skuto (171945) on Thursday January 23, 2003 @05:58AM (#5141856) Homepage
    They made fools out of themselves with this one:

    if (!cam || !cam->ops)
    return -ENODEV; /* make this _really_ smp-safe */

    if (down_interruptible(&cam->busy_lock))
    return -EINTR;

    if (!cam || !cam->ops)
    return -ENODEV;

    Their comment: 'We believe this could be indication of a novice programmer...blabla...shows poor grasp of the code'.

    BZZZZZZZZZT

    Nice try kids, but unlike you, this piece of code was probably written by an experienced guy that has actually written code for parallel systems before. Since it's tricky, you would be excused if not for the 'novice programmer' comment above and the fact that the code itself says it's there for SMP safety.

    Here's a hint: UNTIL you acquire the lock on 'cam', any other process can change the value, including at the point BETWEEN the first check and the acquisation of the lock.

    --
    GCP
  • by AlecC (512609) <aleccawley@gmail.com> on Thursday January 23, 2003 @06:55AM (#5141973)
    Also, in addition to its obvious practical uses, Linux provides a huge open codebase useful for researchers investigating questions about software engineering.

    And confirms the "many eyes make few bugs" feature of Open Source.

    A pity these guys won't release their tester for general use. However, according to other posters, there are similar tools which are Open Source. Would it be worthwhile setting up a facility on Sourceforge to run such tools automatically on the compile farm? Obviously, Project owners would have to sign up for the service, but the habit of regularly running checkers is surely to be encouraged.

    It would probably only work for people like me who treat all warnings as serious. I believe in the "Quiet, dark cockpit" as Airbus put it - if the normal state is no messages but all messages are enabled, you hear of problems much earlier in the development cycle.

  • by awol (98751) on Thursday January 23, 2003 @07:03AM (#5141999) Journal
    I have only one piece of advice I give in answer to this kind of question. If you observe something in an application (and this applies to big ones more than little ones, but it is worthwhile for all software) that you cannot explain, it is a problem, almost certianly a bug. But a problem waiting to bite you in the ass nonetheless.

    We spent almost two years with software that was live and on about three occasions during that time we observed a problem that was moderately serious (critical but for the redundancy of the architecture). Now eventually we found the problem, design flaw/bug, but the evidence for the problem was there for all to see a year before the system went live, even in testing. An apparently benign difference in counters between different instances of the same component, that none of us could adequately explain, since they should all have been the same, But all the other counters were identical (these other counters were of a coarser grain and counting different things). If we had found the cause of the difference at the time we would have found the flaw at the time (guaranteed, since it was obvious once one knew the source of the deviation).

    That one is the best example that I can think of to demonstrate the "If you can't explain it, it's broken" aspect of software behaviour
  • by Paul Lamere (21149) on Thursday January 23, 2003 @08:49AM (#5142296) Homepage Journal
    Try this: pmd [sourceforge.net]
  • by fitten (521191) on Thursday January 23, 2003 @09:15AM (#5142385)
    Yes, and to the point, critical systems software tests have use code coverage as one of the measures of being 'fit' to use. Dead code cannot be executed and thus falls into the 'untouched code' category. If the software package has much of this untouched/untouchable code, it can't be used in a critical system. For example (going from memory here), Motif couldn't be certified for digital graphical primary instrument display because of this (I think that even after exhaustive testing, it only hit the 66% coverage mark, IIRC) on the Boeing 777. It could therefore only be used for an alternate display and the primary displays had to be implemented using something else.

    The problem is that if the code can't be tested, it cannot be trusted and if *somehow* that code got executed while operational, the results could be "bad".
  • Re:Dead code (Score:2, Interesting)

    by Squeak (10756) on Thursday January 23, 2003 @10:00AM (#5142623)
    On a project I worked on recently one file contained a comment reading something like: // The below code is unnecessary but the program // does not work if it is removed. // With the code commented out everything is fine.

    The group of people involved in that area of code were also masters of redundancy and inefficiency. Their code could often be rewritten and shortened to 20% of its original length. Not BY 20%, TO 20%.
  • by ConceptJunkie (24823) on Thursday January 23, 2003 @11:02AM (#5142999) Homepage Journal
    Sometimes it's worse. I quit a job after 15 months, because I was constantly getting in trouble for trying to fix the cause of the problems rather than the symptoms.

    I was working on a 300,000-line Windows application, and I am not exaggerating here, it was about 5/6 redundant code. 100-line functions would be copy-and-pasted a dozen times and two lines changed in each. Plus, there were numerous executables to this project and often the same code (with minor variations of course) would exist in different executables.

    It was originally written in Borland C++ and the back-end was ported to Visual C++, but all the utility and support functions still existed in _both_ the Borland code and Microsoft code. Worse, they were not identical. Even worse, there was substantial use of STL, which doesn't work the same in Borland C++ (which is an ancient version... circa 1996) and Visual C++.

    That and the fact that using strcpy would have been a step up in maintaining buffer integrity, usually they just copied buffers and if one #define was different from a dozen others in completely different places, memory would be toast and we all know how that manifests.

    Worse, there was UI code written by someone who completely confused parent windows and base classes, such that the child window classes had all the data elements for the parent window, because they were derived from the parent window class!

    I spent an entire week once reviewing every single memcpy, etc, in the entire codebase (which was spaghetti-code in the extreme) just to eliminate all the buffer overruns I was discovering. THe program used a database with about 100 tables (a nightmare of redundancy in itself) and there was a several thousand-line include file of all the field sizes, maintained by hand (with plenty of typos). Eventually, I wrote a util to generate that include file automatically, which of course wasn't appreciated.

    I was trying to overcome these difficulties while being barraged with support calls, sometimes critical because this was software to manage access control systems for buildings, meaning I spent 80% of my time fighting fires. You know, situations like: "Our system is down... you need to fix this bug because we've had to hire guards for each door until we can get it back up again."
    There was only one other person working with me, and he quit in disgust and was not replaced for about 3 months.

    Finally, after stuggling mightily (in time, effort and at the expense of emotional well-being) to overcome the sheer incompetence put into this project (parts of which were 10 years old), I finally gave notice after it looked like my unscrupulous boss (who wrote a lot of this code) was doing everything he could to make it look like my fault to the client (even though they knew better and really liked working with me, precisely because I was not a BS-artist)... and after 15 years over never needing more than two weeks to find good work, I have been unemployed since May 2002.

    There's a moral here, but it escapes me.

  • by Anonymous Coward on Thursday January 23, 2003 @11:45AM (#5143236)
    The sad part is that these sorts of errors are much less likely to be found and fixed than many others, because there is a surprisingly large amount of code that emits compiler warnings under gcc, signifying that the author didn't bother to compile it in the first place. And, to make matters worse, I've seen code modules emit these compiler warnings for several kernel builds in a row, which tells me that the author didn't bother to double-check his code after it had been incorporated into an official kernel.

    That lack of discipline is probably the seminal cause of the more serious errors that the paper cites.

    The sad part is that, under the Linux software development model, it is unlikely that there will be a formal code review by a group of knowledgeable experts who will go down and visually read and inspect every line of code.

    I worked for several years at a well-known vendor of a UNIX system, and that sort of code inspection was rarely, if ever, done. It's gotten too expensive. It took a lot of wheedling and sheer luck to get one programmer to edit a typo in an error message that was being printed on the system console. The only reason I was finally able to get it fixed was because the programmer had been in the module earlier in the day, had broken something, and I needed him to fix his new problem. While he was editing the code, I also pointed out the misspelling in the 'printf' statements, and he took an extra 30 seconds to tidy it up. I refer to him as a 'programmer' and not a 'software engineer' because he, like some Linux kernel code writers' lack the rigorous discipline needed to be a true software engineer.
  • by pclminion (145572) on Thursday January 23, 2003 @12:08PM (#5143413)
    You're one of those guys who thinks that anyone who doesn't grasp precisely all the different technical fields you do, must be an fool.

    These researchers obviously have a good hold on compiler technology, since they implemented their checkers with xgcc. They also seem to understand logic quite well, since their code uses and extends on gcc's control-flow analysis algorithms. And they do, actually, understand what's going on here.

    As for your particular example, the check really is redundant, but it was almost definitely intentional. It's true that another processor could change the cam variable between the first check and the lock -- but taking the first check out would have no impact on the functionality or correctness of the code. It's just a performance enhancement so that the routine can exit early in the error case, without the overhead of locking the lock. Removing the bit of redundant code would just add a little overhead to the error case.

    In short, their checker found a true redundancy. They may have not realized its purpose since they don't have specific experience with this kind of parallel programming, but it's a redundancy. If you had actually read the paper instead of merely glancing over it, you would have seen that their checker respects the volatile nature of variables declared as such -- the checker is fully aware that a second thread can change the value between one operation and the other -- and it still figures out that the check is redundant.

    Here's a hint: don't go around claiming people are fools unless you've got some evidence. These guys had hundreds and hundreds of bugs to go through, and expecting them to perfectly analyze every last one of them is unfair.

    Oh, and -10 points for using "BZZZZZZT".

You know that feeling when you're leaning back on a stool and it starts to tip over? Well, that's how I feel all the time. -- Steven Wright

Working...