Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Coverity Report Finds OSS Bug Density Down Since 2006

Posted by timothy on Wed Sep 23, 2009 01:27 PM
from the but-3333-is-a-cool-number dept.
eldavojohn writes "In 2008, static analysis company Coverity analyzed security issues in open source applications. Their recent study of 11.5 billion lines of open source code reveal that between 2006 and 2009 static analysis defect density is down in open source. The numbers say that open source defects have dropped from one in 3,333 lines of code to one in 4,000 lines of code. If you enter some basic information, you can get the complimentary report that has more analysis and puts three projects at the top tier in quality of the 280 open source projects: Samba, tor, OpenPAM, and Ruby. While Coverity has developed automated error checking for Linux, their static analysis seems to be indifferent toward open source."
+ -
story

Related Stories

[+] Automated Linux Error Checking 25 comments
Caydel writes "In a recent message to the Linux Kernel Mailing List (LKML), Ben Chelf, CTO of Coverity, Inc. announced an internal framework to continually scan open source projects for source defects and provide the results of their analysis back to the developers of those projects. The linux kernel is one of 32 open source projects monitored by Coverity. Coverity is looking for a few group-nominated maintainers to access the reports, in order to patch the bugs found before they are announced to the general public. For those not familiar with Coverity, they are a small company out of Stanford who monitor source code correctness through automatic static source code analysis."
[+] Linux: Bug Hunting Open-Source vs. Proprietary Software 244 comments
PreacherTom writes "An analysis comparing the top 50 open-source software projects to proprietary software from over 100 different companies was conducted by Coverity, working in conjunction with the Department of Homeland Security and Stanford University. The study found that no open source project had fewer software defects than proprietary code. In fact, the analysis demonstrated that proprietary code is, on average, more than five times less buggy. On the other hand, the open-source software was found to be of greater average overall quality. Not surprisingly, dissenting opinions already exist, claiming Coverity's scope was inappropriate to their conclusions."
[+] IT: Coverity Reports Open Source Security Making Great Strides 48 comments
Coverity is claiming they have found and helped to fix more than 7,500 security flaws in open source software since the inception of the governmentally backed project designed to harden open source software. The company has also identified eleven projects that have been especially responsive in correcting security problems. "Eleven projects have been awarded the newly announced status of Rung 2, including those known as Amanda, NTP, OpenPAM, OpenVPN, Overdose, Perl, PHP, Postfix, Python, Samba, and TCL."
: by
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • Three? (Score:5, Funny)

    by Dhar (19056) on Wednesday September 23, @01:30PM (#29519217) Homepage

    "...puts three projects at the top tier in quality of the 280 open source projects: Samba, tor, OpenPAM, and Ruby."

    Counting, apparently, was low in quality.

    • Re: (Score:1, Insightful)

      by Anonymous Coward

      TFA says four.

      So, not only are the /. summaries merely paragraphs copied from the article nowadays, they're paragraphs copied incorrectly.

      • and then you get so-called slashdotphiles, who think they can hear artifacts in the lossy story compression.

        let's see how you fare in a double blind test

        • I have gold-plated Ethernet cables, so my Internets sound nice and crisp. You can really hear the richness in the lower kbps range.

      • Re:Three? (Score:5, Insightful)

        by eldavojohn (898314) * <my/.username@@@gmail.com> on Wednesday September 23, @01:59PM (#29519627) Homepage Journal

        TFA says four.

        So, not only are the /. summaries merely paragraphs copied from the article nowadays, they're paragraphs copied incorrectly.

        So if my summary was "merely paragraphs copied from the article" then where did I get the 1 in 3,333 and 1 in 4,000 numbers from?

        Also, if all I did was copy/paste the article, I'd be plagiarizing and -- not only that -- I would have copy/pasted the correct count of the projects in Rung 3 status. Instead I skimmed the report and was thinking "Rung 3" when I wrote that sentence the three was put in instead of the four. Doesn't make me any less wrong but I hate anonymous non-constructive criticism that's modded up. I apologize for my human error, obviously the human editor also missed it. Since you're anonymous, I can't assume you're human and beg you to relate to my plight of errors. I'm sure my error made the summary completely unreadable. I'm also certain that you've published hundreds of articles on Slashdot without so much as a single error in any of them.

        You do know that the number of submissions I've had recently, almost all have had some flaw or error in them. Simply because I realize there's no reward for fact checking. And there's no penalty for getting an error published. So assuming the summary sells to eyeballs and there's no error large enough to get it rejected the next thing is timing. I've written submissions that have been beat out by a few minutes and I get marked "dupe" by firehose. So that pushes me from taking 10-15 minutes to create a summary to 2-3 minutes. Oh well, the worse penalty is if I respond to the article (like this) I'm modded down by righteous moderators. Doesn't really bother me.

        If the editors aren't catching the errors and I've got no incentive to reduce the errors, do you think they're going to go away?

        • Re: (Score:3, Insightful)

          We're bitching about the slashdot editors, not you. It's their job to catch submitter mistakes. That is what an editor does. The really annoying thing is they're as likely to "edit" the summary to introduce mistakes as to remove them.
    • "Coverity Report Finds OSS Bug Density Down Since 2006"

      Bad news for entomologists, huh?

  • Fewer but bigger (Score:1, Interesting)

    by Anonymous Coward

    Why would Samba and Linux have got so unstable over the years, then?

    • Re: (Score:1, Offtopic)

      Maybe because grammar is tough for windows users?

    • Wait, Samba and Linux are unstable now? That's news to me. I can't remember the last time either crashed for me ever.

    • Possibilities are endless, just a few here:
      1. Fixing bugs found by Coverity might give false sense of "goodness", especially as:
      2. Coverity does not catch all problems, e.g. timing or parallelism related. Dual cores are now abundant.
      3. A lot of hardware is flaky.

  • puts three projects at the top tier in quality of the 280 open source projects: Samba, tor, OpenPAM, and Ruby

    Hmmm...

    In all seriousness, this seems to point to an increasing level of sophistication and maturity in OSS products and procedures, which can only be a good thing.

    • Not really, it's a meaningless statistic. Coverity has been publishing these reports for a few years. Every time they do, the relevant projects fix all of the bugs they find. The next year, some proportion of that code is the same code that already had these bugs fixed, so if the total number of bugs per line of code didn't go down it would be quite disappointing. On top of that, there are other static analysis tools, like clang, that are used by a lot of open source projects. Even if Coverity reports
  • by StuartHankins (1020819) on Wednesday September 23, @01:34PM (#29519281)
    "... and puts three projects at the top tier in quality of the 280 open source projects: Samba, tor, OpenPAM, and Ruby."

    Our chief weapon is surprise...surprise and fear...fear and surprise....
    Our two weapons are fear and surprise... and ruthless efficiency....
    Our three weapons are fear, surprise, and ruthless efficiency...
    and an almost fanatical devotion to the Pope....
    Our four... no...
    Amongst our weapons... Amongst our weaponry...
    are such elements as fear, surprise...
    I'll come in again.
  • by MosesJones (55544) on Wednesday September 23, @01:34PM (#29519283) Homepage

    The question of course is "Is 4000 good, average or bad?" can't be answered because closed source companies just aren't going to publish this sort of information.

    So what we can say is that the quality of OSS is trending upwards, but we can't say whether this makes it better, equivalent or worse than close source competitors.

    What are the odds on any of them taking up the challenge?

    • Re: (Score:2, Informative)

      by Anonymous Coward

      Actually the topic is the subject of research and the blog below quotes some book that says Microsoft is at 1/2000 lines of code.
      http://amartester.blogspot.com/2007/04/bugs-per-lines-of-code.html

      Of course, these studies try to assess the number of defects that have not been found yet... So the numbers are to take with a grain of salt, but apparently testing the software before delivery gets 90% of the bugs.

      The Coverity report is likely based on what the tool says, so you need a grain of salt for that too.

      Th

      • Re: (Score:3, Interesting)

        There can be some serious "methodology" problems in many of the definitions of "bugs", that can seriously confuse the bug counters.

        An example that I like to use is a project I worked on in the late 1990s. An important part of the package that I delivered included a directory of several hundred C source files, mostly small, with at least one bug in each. The project's leaders got some chuckles out of mentioning this at meetings, commenting that they had no intention of letting me fix any of the bugs, since

    • What I heard from a Coverity employee doing a presentation is that the best closed source/commercial projects score as good as the best Open Source projects; bad commercial projects do as bad as bad Open Source projects.

      In other words, the variation in both categories is so big (more than a factor 10!) that one can not say either side is better with statistical relevance.

    • They probably wouldn't make a good representative sample, but you could take the source code of projects that were formerly closed and subsequently opened to see how many errors they averaged. The ID engines come immediately to mind.

  • ... or less effective bug checking?

  • That's some good coding.. Makes me feel like a n00b. I'm not sure what my bug to code ratio is, but I'm sure its a lot higher than that.
    • It is not as much that 1 line out of 4000 is average for each programmer. It is just that they fix the bugs before release.

      • They also decrease the bugs-per-line count in their coding standards. That's why you see lots of blank lines in the code, lines that contain just a single brace, etc. The more lines you can spread your code over, the fewer bug you have per line.

        If you don't like this observation, you shouldn't be measuring bugs-per-line. But nearly every company does just that.

        There was also the funny thing a few years ago, when MS was claiming that some percent of linux code was stolen from Windows. Someone did a gre

        • I don't know.

          If someone uses retval for the return value they must be stealing my code.

          • Then there's the infamous case of the AT&T /bin/true program, which was a shell script that contained nothing but a blank line and a copyright notice. So if you include blank lines in your code, you're violating AT&T's copyright.

            I had fun once (around 1990) by "publishing" the entire text of one of these on a newsgroup, and publicly challenging AT&T's lawyers to take me to court over this blatant copyright violation. For some unexplained reason, I never heard from them.

            (If you google for "/bin

    • Remember the bug finding is automated. There are only some classes of bugs that can be automatically found.

      • And not just automated bug finding, but bug finding by static analysis. This is notoriously bad at finding bugs in programs that use shared libraries or indirection layers (e.g. code that calls other code via function pointers).
  • It seems logical that older security issues are more well-known and documented than newer ones. Is it possible that the results do not point to an improvement in coding quality so much as an inability to detect newer flaws as accurately older flaws?
  • Survivorship bias (Score:5, Interesting)

    by vlm (69642) on Wednesday September 23, @02:04PM (#29519689)

    Survivorship bias

    http://en.wikipedia.org/wiki/Survivorship_bias [wikipedia.org]

    The projects that were alive back then, and now, are obviously more mature, thus would have fewer bugs. Unless you believe in spontaneous generation of bugs at a constant rate in unchanged code (in my experience, actually not too unbelievable for old C++ compiled by the newest G++ due to specification drift)

  • From the press release [coverity.com]:

    Since 2006, more than 11,200 defects in open source programs have been eliminated as a result of using the Coverity Scan service.

    While this is good for open source and demonstrates the value of static analysis, it is not surprising that if you fix the issues found, the number of issues remaining will go down.

    • Re: (Score:2, Informative)

      If you fix the issues, Coverity moves the project to a new rung and performs stricter analysis to find more types of errors.

  • I love Coverity. I love other static analysis tools too -- I'm one of the lead developers for Perl::Critic, which performs static analysis on Perl code. They are enormously valuable tools.

    However, I've seen many cases where people read the issue report from the tool and fix the symptom rather than the problem. The improvement from 1 in 3333 to 1 in 4000 is fantastic, but that means 1 *Coverity issue* in 4000, not 1 *bug* in 4000 lines.

    My current closed source project has a Coverity count of 2 issues in 1

    • Posted Anonymous Coward for obvious reasons.

      Why? It's not obvious. The sequence of bits stored with your UID known as "karma" might take a hit because this is offtopic? Oh no. End of the world. Do you think your "karma" here matters for anything in the real world?

      Or maybe you're really Ballmer and don't want to get caught here.

      So explain, because it's not obvious at all.

    • Re:Umm yeah (Score:4, Insightful)

      by Volante3192 (953645) on Wednesday September 23, @02:13PM (#29519791)

      If they check 1 line of code every second it would take 133,101.85 years to check 11.5 billion lines of code. At 1000 lines of code every second you are looking at 133.10 years to check that much code. At 4000 lines of code every second (e.g. 4GHz) you are looking at 33.2 years to check that much code.

      And if they were only using one system to do this, I'd imagine that would be a problem. I wonder, though, if you spread the processing across, oh, say, 512 processors, if you could get that time down under a month...

      • 11.5 billion lines of code, with one bug in every 4,000 lets say that's the top line number that their software kicks out almost 3 million bugs. So are there 3 million bugs in all of open source?

        At my last posting we used FxCop to analyze our VB.Net software and it found 3-4,000 errors, now what the tool looks for is wide ranging but all but two of these findings were Variable Naming conventions and best practices developed by the .Net development team.

        The 2 we fixed said Variables did not have Strong typin

        • So looking at the article, but not the study itself, people are submitting their projects to Coverity for static analysis, and 11,200 defects have been eliminated with the ehlp of the program.

          Article also says 60 million unique lines of code were scanned on a recurring basis from 280 projects.

      • How do you design an algorithm to detect bugs? If there is such an algorithm, why isn't included in compilers so that they can produce bug free code?

        It seems to me that in order to determine whether something is a bug, you have to know what the programmers intent was. That requires intelligence.

        • http://scan.coverity.com/faq.html#static [coverity.com]

          Some examples of the defects include:

          * leaked resources
          * references to pointers that could be NULL
          * references to pointers that are guaranteed to be NULL
          * use of uninitialized data
          * array overruns
          * unsafe use of signed values
          * use of resources that have been freed

          It wo

    • Re: (Score:3, Informative)

      Isn't 4000 lines/code a second 4 kHz, not GHz, if we're using Hz to measure the frequency of line-processing?

    • Re:Umm yeah (Score:5, Informative)

      by Disgruntled Goats (1635745) on Wednesday September 23, @02:43PM (#29520391)

      At 4000 lines of code every second (e.g. 4GHz) you are looking at 33.2 years to check that much code.

      GHz = 1 billion cycles per second. You're only about 6 orders of magnitude off.

      • How efficient is static analysis? perhaps it takes a million processor cycles to check one line of code, and the author takes it for granted that us plebs understand that.
    • Re:Umm yeah (Score:5, Insightful)

      by eldavojohn (898314) * <my/.username@@@gmail.com> on Wednesday September 23, @03:01PM (#29520743) Homepage Journal

      A: We know they didn't check the code by hand.

      Of course not, do you know what static code analysis [wikipedia.org] is? I repeatedly said that in the summary.

      B: The methodology didn't classify defects (cosmetic, seucrity, minor, major. etc.)

      From the report, which is linked to in the article and you obviously didn't care to read before criticizing:

      NULL Pointer Deference
      Resource Leak
      Unintentional Ignored Expressions
      Use Before Test (NULL)
      Use After Free
      Buffer Overflow (statically allocated)
      Unsafe Use of Returned NULL
      Uninitialized Values Read
      Unsafe Use of Returned Negative
      Type and Allocation Size Mismatch
      Buffer Overflow (dynamically allocated)
      Use Before Test (negative)

      They then go on to discuss Function Length and Complexity Metrics.

      C: The numbers aren't normalized nor broken by application size.

      I don't understand how this is statistically relevant. The summary I gave lists by static code defect per line of code and looks at function length. Of course a project with 4 million lines of code would have more defects than one of 4 thousand lines of the code. The lines of code is the normalization!

      D: The use of a bug reporting database needs to be measured in regards to a baseline filing\fix % not a total volume (as we need to correlate new lines of code being added)

      Does it make any difference to the end user whether 90% of the project is new lines of code or 9% of the project is new lines of code?

      It reads like something from the Onion.

      You didn't read the report so you can't really speak.

      Dear Lord journalism is dead...

      Says the poster who didn't read or understand the report.