Forgot your password?
typodupeerror
Programming IT Technology

Do Static Source Code Analysis Tools Really Work? 345

Posted by CmdrTaco
from the if-you're-stupid-they-do dept.
jlunavtgrad writes "I recently attended an embedded engineering conference and was surprised at how many vendors were selling tools to analyze source code and scan for bugs, without ever running the code. These static software analysis tools claim they can catch NULL pointer dereferences, buffer overflow vulnerabilities, race conditions and memory leaks. Ive heard of Lint and its limitations, but it seems that this newer generation of tools could change the face of software development. Or, could this be just another trend? Has anyone in the Slashdot community used similar tools on their code? What kind of changes did the tools bring about in your testing cycle? And most importantly, did the results justify the expense?"
This discussion has been archived. No new comments can be posted.

Do Static Source Code Analysis Tools Really Work?

Comments Filter:
  • In Short, Yes (Score:5, Informative)

    by Nerdfest (867930) on Monday May 19, 2008 @12:24PM (#23463724)
    They're not perfect, and won't catch everything, but they do work. Combined with unit testing, you can get a very low bug rate. Many of these (for Java, at least) are open source, so the expense in negligible.
  • Yes (Score:4, Informative)

    by progressnerd (1189895) on Monday May 19, 2008 @12:30PM (#23463810)
    Yes, some static analysis tools really work. FindBugs [sourceforge.net] works well for Java. Fortify [fortify.com] has had good success finding security vulnerabilities. These tools take static checking just a step beyond what's offered by a compiler, but in practice that's very useful.
  • by gmack (197796) <gmack@innerfir e . net> on Monday May 19, 2008 @12:32PM (#23463838) Homepage Journal
    I found even tools like lint or splint can catch interesting bugs but not nearly as many as the time I enabled GCC's type checking for varargs on a software project a company I work for was developing.
  • Testing cycle (Score:5, Informative)

    by mdf356 (774923) <mdf356@gmai[ ]om ['l.c' in gap]> on Monday May 19, 2008 @12:33PM (#23463856) Homepage
    I forgot to answer your other question.

    Since we've had the tool for a while and have fixed most of the bugs it has found, we are required to run static analysis on new code for the latest release now (i.e. we should not be dropping any new code that has any error in it found via static analysis).

    Just like code reviews, unit testing, etc., it has proved useful and was added to the software development process.
  • Yes. (Score:4, Informative)

    by Anonymous Coward on Monday May 19, 2008 @12:35PM (#23463868)
    The Astrée [astree.ens.fr] static analyser (based on abstract interpretation) proved the absence of run-time errors in the primary control software of the Airbus A380.
  • Yes (Score:5, Informative)

    by kevin_conaway (585204) on Monday May 19, 2008 @12:36PM (#23463876) Homepage

    Add me to the Yes column

    We use them (PMD and FindBugs) for eliminating code that is perfectly valid, yet has bitten us in the past. Two Java examples are unsynchronized access to a static DateFormat object and using the commons IOUtils.copy() instead of IOUtils.copyLarge().

    Most tools are easy to add to your build cycle and repay that effort after the first violation

  • Re:Yes. (Score:5, Informative)

    by Anonymous Coward on Monday May 19, 2008 @12:37PM (#23463884)
    Sigh. That bug wasn't from fixing the use of uninitialized memory, it was from being overzealous and "fixing" a second (valid, not flagged as bad by Valgrind) use of the the same function somewhere near the first use.
  • Re:Yes. (Score:5, Informative)

    by Anonymous Coward on Monday May 19, 2008 @12:38PM (#23463888)
    I think the actual details weren't very widely reported anyway. Apparently two statements were removed; one read from uninitialised memory, but the other was completely valid. Since the second one was responsible for most of the randomness, removing it reduced the keyspace to the point where it can be brute forced.

  • Re:In Short, Yes (Score:4, Informative)

    by crusty_yet_benign (1065060) on Monday May 19, 2008 @12:43PM (#23463964)
    In my experience developing win/mac x-platform apps, Purify (Win), Instruments (OSX) and BoundsChecker (Win) have all been useful. They find obvious stuff, that might have led to other issues. Recommended.
  • by doug (926) on Monday May 19, 2008 @12:44PM (#23463978)
    Is this what you were talking about?

    http://it.slashdot.org/article.pl?sid=08/01/11/1818241 [slashdot.org]

    - doug
  • Re:OSS usage (Score:3, Informative)

    by NewbieProgrammerMan (558327) on Monday May 19, 2008 @12:47PM (#23464014)
    IIRC, they donated the results of their analysis, not the actual tool itself. If somebody *did* actually donate a free license to use their software to those projects, I missed it and would love a link to the details. :)
  • Coverity & Klocwork (Score:5, Informative)

    by Anonymous Coward on Monday May 19, 2008 @12:50PM (#23464056)
    We have had presentations from both Coverity and Klocwork at my workplace. I'm not entirely fond of them, but they're wayyyyy better than 'lint'. :) I much prefer using "Purify" whenever possible, since run-time analysis tends to produce fewer false-positives.

    My comments would be:

    (1) Klockwork & Coverity tend to produce a lot of "false positives". And by a lot, I mean, *A LOT*. For every 10000 "critical" bugs reported by the tool, only a handful may be really worth investigating. So you may spend a fair bit of time simply weeding through what is useful and what isn't.

    (2) They're expensive. Coverity costs $50k for every 500k lines of code per year... We have a LOT more code than this. For the price, we could hire a couple of guys to run all of our tools through Purify *and* fix the bugs they found. Klocwork is cheaper; $4k per seat, minimum number of seats.

    (3) They're slow. It takes several days running non-stop on our codebase to produce the static analysis databases. For big projects, you'll need to set aside a beefy machine to be a dedicated server. With big projects, there will be lots of bug information, so the clients tend to get bogged down, too.

    In short: It all depends on how "mission critical" your code is; is it important, to you, to find that *one* line of code that could compromise your system? Or is your software project a bit more tolerant? (e.g., If you're writing nuclear reactor software, it's probably worthwhile to you to run this code. If you're writing a video game, where you can frequently release patches to the customer, it's probably not worth your while.)
  • Re:Yes. (Score:4, Informative)

    by ezzzD55J (697465) <slashdot5@scum.org> on Monday May 19, 2008 @12:57PM (#23464136) Homepage
    Yes, only getpid(). Idiotic.
  • by SpryGuy (206254) on Monday May 19, 2008 @12:57PM (#23464138)
    I've used the above two tools ... the IntelliJ IDEA IDE for Java development, and the Visual Studio plug-in Resharper for C# development ... and can't imagine living without them.

    Of course, they provide a heck of a lot more than just static code analysis, but the ability to see all syntax errors in real time, and all logic errors (like potential null-references, dead code, unnecessary 'else' statements, etc, etc) saves way too much time, and has, in my experience, resulted in much better, more solid code. When you add on all the intelligent refactoring, vastly improved code navigation, and customizable code-generation features of these utilities, it's a no-brainer.

    I wouldn't program without them.
  • by FishWithAHammer (957772) on Monday May 19, 2008 @12:59PM (#23464156)
    I believe Gendarme [mono-project.com] might be of some use. Just don't invoke the Portability assemblies and I can't see why it'd fail.
  • Re:Yes. (Score:4, Informative)

    by Josef Meixner (1020161) on Monday May 19, 2008 @01:02PM (#23464180) Homepage

    If you're using uninitialized memory to generate randomness, it wasn't very random in the first place.

    It is only one source for the entropy pool and the SSL "fix" was a Debian maintainer running valgrind on OpenSSL, finding a piece of code where uninitialized memory was accessed, "fixed" it and a "similar piece" and accidently removed all entropy from the pool. The result of that is, that all ssh-keys and ssl-certs created on Debian in the last 20 months are to be considered broken. (Debian Wiki SSLkeys on the scope and what to do [debian.org])

  • Yes, absolutely (Score:5, Informative)

    by Llywelyn (531070) on Monday May 19, 2008 @01:02PM (#23464190) Homepage

    FindBugs is becoming increasingly widespread on Java projects, for example. I found that between it and JLint I could identify a substantial chunk of problems caused by inexperienced programmers, poor design, hastily written code, etc. JLint was particularly nice for potential deadlocks, while FindBugs was good for just about everything else.

    For example:

    • Failure to make null checks.
    • Ignoring exceptions
    • Defining equals() but not hashCode() (and the other variations)
    • Improper use of locks.
    • Poor or inconsistent use of synchronization.
    • Failure to make defensive copies.
    • "Dead stores."
    • Many others [sourceforge.net]

    At least in the Java world, I wish more people would use them. It would make my job so much easier.

    My experience in the Python world is that pylint is less interesting than FindBugs: many of the more interesting bugs are hard problems in a dynamically typed language and so it has more "religious style issues" built in that are easier to test for. It still provides a great deal of useful output once configured correctly, and can help enforce a consistent coding standard.

  • In short, YMMV (Score:5, Informative)

    by Moraelin (679338) on Monday May 19, 2008 @01:05PM (#23464230) Journal
    My experience has been that while in the hands of people who know what they're doing, they're a nice tool to have, well, beware managers using their output as metrics. And beware even more a consultant with such a tool that he doesn't even understand.

    The thing is, these tools produce

    A) a lot of "false positives", code which is really OK and everyone understand why it's ok, but the tool will still complain, and

    B) usually includes some metrics of dubious quality at best, to be taken only as a signal for a human to look at it and understand why it's ok or not ok.

    E.g., ne such tool, which I had the misfortune of sitting through a salesman hype session of, seemed to be really little more than a glorified grep. It really just looked at the source text, not at what's happening. So for example if you got a database connection and a statement in a "try" block, it wanted to see the close statements in the "finally" block.

    Well, applied to an actual project, there was a method which just closed the connection and the statements supplied as an array. Just because, you know, it's freaking stupid to copy-and-paste cute little "if (connection != null) { try { connection.close(); } catch (SQLException e) { // ignore }}" blocks a thousand times over in each "finally" block, when you can write it once and just call the method in your finally block. This tool had a trouble understanding that it _is_ all right. Unless it saw the "connection.close()" right there, in the finally block, it didn't count.

    Other examples include more mundane stuff like the tools recommending that you synchronize or un-synchronize a getter, even when everyone understands why it's OK for it to be as it is.

    E.g., a _stateless_ class as a singleton is just an (arguably premature and unneded) speed optimization, because some people think they're saving so much by a singleton instead of the couple of cycles it takes to do a new on a class with no members and no state. It doesn't really freaking matter if there's exactly one of it, or someone gets a copy of it. But invariably the tools will make an "OMG, unsynchronized singleton" fuss, because they don't look deep enough to see if there's actually some state that must be unique.

    Etc.

    Now taken as something that each developper understands, runs on his own when he needs it, and uses his judgment of each point, it's a damn good thing anyway.

    Enter the clueless PHB with a metric and chart fetish, stage left. This guy doesn't understand what those things are, but might make it his personal duty to chart some progress by showing how much fewer warnings he's got from the team this week than last week. So useless man-hours are spent on useless morphing perfectly good code, into something that games the tool. For each 1 real bug found, there'll be 100 harmless warnings that he makes it his personal mission to get out of the code.

    Enter the snake-oil vendor's salesman, stage right. This guy only cares about selling some extra copies to justify his salary. He'll hype to the boss exactly the possibility to generate such charts (out of mostly false positives) and manage by such charts. If the boss wasn't already in a mind to do that management anti-pattern, the salesman will try to teach him to. 'Cause that's usually the only advantage that his expensive tool has over those open source tools that you mention.

    I'm not kidding. I actually tried to corner one into;

    Me: "ok, but you said not everything it flags there is a bug, right?"

    Him: "Yes, you need to actually look at them and see if they're bugs or not."

    Me: "Then what sense does it make to generate charts based on wholesale counting entities which may, or may not be bugs?"

    Him: "Well, you can use the charts to see, say, a trend that you have less of them over time, so the project is getting better."

    Me: "But they may or may not be actual bugs. How do you know if this week's mix has more or less actual bugs than last weeks, regardless of wh
  • by Arakageeta (671142) on Monday May 19, 2008 @01:06PM (#23464240)
    My large C/C++ project (2,000,000+ SLOC) started using Coverity Prevent about a year ago. Its results have truly been invaluable. We simply have too much code for standard human code reviews or for detailed run-time coverage analysis (ex. Insure* or valgrind). Prevent has caught many programming errors (some extremely obscure and/or subtle) and saved use a ton of money and time.

    * I really like Insure, but it is difficult to set up on a system composed of many shared libraries. However, there are some bugs that really need run-time analysis to catch.

  • Re:In Short, Yes (Score:3, Informative)

    by Simon80 (874052) on Monday May 19, 2008 @01:11PM (#23464292)
    There's also valgrind [valgrind.org], for Linux users, and mudflap [gnu.org], for gcc users. I haven't tried mudflap yet, but valgrind is a very good runtime memory checker, and mudflap claims to do similar things.
  • Many of never all (Score:5, Informative)

    by mugnyte (203225) on Monday May 19, 2008 @01:13PM (#23464302) Journal

      Short version:

          There are real bugs, with huge consequences, that can be detected with static analysis.
          The tools are easy to find and worth the price, depending on the customer base you have.
          In the end, that cannot detect "all" bugs that could arise in the code.

      Worth it?
          Only you can decide, but after a few sessions learning why tools flag suspect code, if you take those suggest to heart, you will be a better coder.
  • by ncw (59013) on Monday May 19, 2008 @01:19PM (#23464382) Homepage
    The linux kernel developers use a tool originally written by Linux Torvalds for static analysis - sparse.

    http://www.kernel.org/pub/software/devel/sparse/ [kernel.org]

    Sparse has some features targeted at kernel development - for instance spotting mixing up kernel and user space pointers and a system of code annotations.

    I haven't used it but I do see on the kernel mailing list that it regularly finds bugs.
  • Re:In Short, Yes (Score:5, Informative)

    by Entrope (68843) on Monday May 19, 2008 @01:24PM (#23464430) Homepage

    My group at work recently bought one of these. They catch a lot of things that compilers don't -- for example, code like this:

    int array[4], count, ii;

    scanf("%d", &count);
    for (ii = 0; ii &lt; count; ++ii)
    {
    scanf("%d", &array[ii]);
    }

    .. where invalid input causes arbitrarily bad behavior. They also tend to be better at inter-procedural analysis than compilers, so they can warn you that you're passing a short literal string to a function that will memcpy() from the region after that string. They do have a lot of false positives, but what escapes from compilers to be caught by static analysis tools tend to be dynamic behavior problems that are easy to overlook in testing. (If the problem were so obvious, the coder would have avoided it in the first place, right?)

  • Re:In Short, Yes (Score:5, Informative)

    by HalWasRight (857007) on Monday May 19, 2008 @01:54PM (#23464758) Journal
    valgrind, BoundsChecker, and I believe the others mentioned, are all run-time error checkers. These require a test case that execises the bug. The static analysis tools the poster was asking about, like those from Coverity [coverity.com] and Green Hills [ghs.com], don't need test cases. They work by analyzing the actual semantics of the source code. I've found bugs with tools like these in code that was hard enough to read that I had to write test cases to verify that the tool was right. And it was! The bug would have caused an array overflow write under the right conditions.
  • Re:In Short, Yes (Score:2, Informative)

    by Shotgun (30919) on Monday May 19, 2008 @02:07PM (#23464944)
    That sort of code should be caught by a code review. Rule #1: Body cavity search all data at the boundaries.

    But I guess dumb developer checkers have a place in the toolbox.

  • A double edged sword (Score:3, Informative)

    by xenocide2 (231786) on Monday May 19, 2008 @02:34PM (#23465246) Homepage
    Static analysis tools are common in the open source world. The lint name is well known enough that many projects make theirs a pun on it, ala lintian. A few years ago a local root exploit in X was discovered by running these sorts of checks. But generally, static analysis tools require human review -- with large code bases they generate large numbers of false positives, especially the dumber ones. This leads to trouble for perfectionists, a common trait among software developers interested in bug fix analysis. For example, the recent massive Debian vulnerability was caused by an overzealous developer trying to fix static analysis flags. One of these flags was valid, one was not, and removing both removed nearly all entropy from the RNG.

    In the more general sense, static analysis cannot find all bugs. There's a trivial proof: a program stuck in an infinite loop is a bug, but finding all such loops would solve the halting problem. Handling interrupts and the like also causes reasoning problems, as it's very hard, if not computationally intractable, to prove multi-threaded software is safe. So static analysis won't rid the embedded world of watchdog timers and other software failure recovery crap.
  • by fbjon (692006) on Monday May 19, 2008 @03:25PM (#23465872) Homepage Journal

    Who "proved" Astree to be error free in the first place?!
    The creators of Astrée, presumably. Proving in the scientific sense that a piece of software is correct can definitely done, it's just really expensive most of the time. In any case they claim that Astrée is sound, i.e. catches all errors, but that the precision can be adjusted to reduce or increase the number of false positives, depending on how much time you have. The A380 fly-by-wire analysis was apparently the first case where no false positives were reported (and no true positives either, of course). According to the page, the analyzer checks C code that doesn't contain dynamic allocation or recursion, so it's probably not applicable to most non-critical software anyway.
  • Re:In Short, Yes (Score:3, Informative)

    by dollargonzo (519030) on Monday May 19, 2008 @03:50PM (#23466284) Homepage
    I disagree. Think of a loop where a break condition depends on the validity of, say, Goldbach's conjecture. No static analyzer can tell when (or if) such a program will halt. This is an extreme example, of course, but any sufficiently complicated expression that affects what code path is taken and/or termination of the program will pretty much have the same problem when it comes to static analysis. "Intractable recursion," as you call it, is computationally equivalent to unbounded loops. So, technically, anything that is in the body of an unbounded loop, except for the most trivial cases, a static analyzer cannot catch.
  • Re:In short, YMMV (Score:5, Informative)

    by Moraelin (679338) on Monday May 19, 2008 @04:47PM (#23467072) Journal
    Compiler warnings, yes, at least for a decent warning level.

    Going out of the way to satisfy a tool, whose only reason to exist is to flag 10 times more stuff than -Wall, I found actually counter-productive.

    And I don't mean just as in, WOMBAT (Waste Of Money Brains And Time.) I mean as in: it teaches people to game the tool, actually hiding their real bugs. And it creates a false sense of security too.

    I've actually had to deal with a program which tested superbly on most metrics of such a tool. But only because the programmer had learned to game it. The program was really an incoherent and buggy mess. But it gamed every single tool they had in use.

    A. to start with the most obvious, some bright guy there had come up with an own CVS script which didn't let you check in, unless you had commented every single method, and every single parameter and exception thrown. Bout damn time, eh? Wrong.

    1. This forced people to effectively overwrite the comments inherited from better documented stuff. E.g., if you had a MyGizmoInterface interface, which was superbly documented, and the MyGizmoImpl class implementing it, it forced you to copy and paste the JavaDoc comments instead of just letting JavaDoc pick them from the interface. So instead of seeing the real docs, now everyone had docs all over the place along the lines of "See MyGizmoInterface.gizmoMethod()" overwriting the actually useful ones there. Or some copied and pasted comments from 1 year ago, where one of the two gradually became obsolete. People would update their comments in one of the two, but let the other say something that wasn't even true any more. Instead of having them in one place, and letting JavaDoc copy them automatically.

    2. The particular coder of this particular program, had just used his counter-script or maybe plugin, to automatically generate crap like:

    /**
      * Method description.
      *
      * @param x method parameter
      * @param y method parameter
      * @param idx method parameter
      * @param str method parameter
      */
    I mean, _literally_. Hundreds of methods had "Method description" as their javadoc comment, and thousands of parameters total were described as "method parameter."

    B. It also included such... brain-dead metrics as measuring the cohesion of each class, by the ratio between number of class members to class methods.

    He had learned to game that too. His code tested as superbly cohesive, although the same class and indeed the same method, could either send an email, or render a PDF, or update an XML in the database, depending on which parameters they got. But the members to methods ratio was grrrreat.

    That's really my problem with it:

    A. Somewhere along the way, they had become so confident in their tools, that noone actually even checked what javadoc comments those classes have. Their script already checks that there are comments, hey, that's enough.

    B. Somewhere along the way, everyone had gotten used to just gaming a stupid tool. If the tool said you have too many or too few class members, you'd just add or remove some to keep it happy. If it complained about complexity, because it considered a large switch statement to have too many effective ifs, you just split it into a several functions: one testing cases 1 to 10, one testing 11 to 20, and so on. Which actually made the code _less_ readable, and generally lower quality. There would have been ways to solve the problems better, but, eh, all that mattered was to keep the tool happy, so noone bothered.

    That's why I'd rather not turn it into a religion. Use the tool, yes, but take it as just something which you need to check and use your own judgment. Don't lose track of which is the end, and which is merely a means to that end.
  • Re:In Short, Yes (Score:2, Informative)

    by edxwelch (600979) on Monday May 19, 2008 @05:19PM (#23467496)
    If your goal is a low bug rate, neither unit testing nor automatic code check tools will have a huge impact. Automatic code check tools will flag down some minor bugs alright, and require little effort, so definately worthwhile.
    Code quality isn't something that's just tacked on at the end of the process. If the design process is done well, there is less chance to bugs creeping in - it's just natural that logical well-designed code is going to be less buggy.
  • Re:In short, YMMV (Score:3, Informative)

    by Allador (537449) on Monday May 19, 2008 @07:21PM (#23468724)
    Wow, nice story. Quite amazing, as you'd think the devs could just do things the right way for not much more work than gaming the tools.

    This kind of thing though, is ultimately a failure of management, whoever leads/runs the dev team. They should be able to see this kind of thing happening and either apply some proper motivation, change the procedures, or let some bad devs go.

    Mind you, bad developers as well. But if I were the owner, the dev mgr would get the brunt first on something like this.
  • Re:In Short, Yes (Score:3, Informative)

    by plover (150551) * on Monday May 19, 2008 @09:19PM (#23469668) Homepage Journal
    That's what the static analyzers do. They don't bother discerning whether or not the true or false path can be taken, they simply follow both paths and report something like "Null pointer dereference in line 222 when 'if' clause at line 123 is false." It doesn't bother to figure out if it's possible for the condition to actually become true or false -- if you coded a decision path, it makes the assumption that either could be followed.

    The analyzers wouldn't be very useful if they had to fork at every code branch. A program with not-too-many nested decision branches would quickly become unexaminable.

  • by anpe (217106) on Tuesday May 20, 2008 @05:26PM (#23482776)
    It's not "error free", it's _run-time error free_. Which according to the GP's link means that no undefined behaviour according to the C standard or user added asserts may happen.
    So for example, the program won't ever divide by zero or overflow an integer while summing.

2000 pounds of chinese soup = 1 Won Ton

Working...