Coverity Report Finds OSS Bug Density Down Since 2006 79
eldavojohn writes "In 2008, static analysis company Coverity analyzed security issues in open source applications. Their recent study of 11.5 billion lines of open source code reveal that between 2006 and 2009 static analysis defect density is down in open source. The numbers say that open source defects have dropped from one in 3,333 lines of code to one in 4,000 lines of code. If you enter some basic information, you can get the complimentary report that has more analysis and puts three projects at the top tier in quality of the 280 open source projects: Samba, tor, OpenPAM, and Ruby. While Coverity has developed automated error checking for Linux, their static analysis seems to be indifferent toward open source."
Re: (Score:2)
Three? (Score:5, Funny)
"...puts three projects at the top tier in quality of the 280 open source projects: Samba, tor, OpenPAM, and Ruby."
Counting, apparently, was low in quality.
Re: (Score:1, Insightful)
TFA says four.
So, not only are the /. summaries merely paragraphs copied from the article nowadays, they're paragraphs copied incorrectly.
Re:Three? (Score:5, Funny)
and then you get so-called slashdotphiles, who think they can hear artifacts in the lossy story compression.
let's see how you fare in a double blind test
Re: (Score:3, Funny)
I have gold-plated Ethernet cables, so my Internets sound nice and crisp. You can really hear the richness in the lower kbps range.
Re:Three? (Score:5, Insightful)
TFA says four.
So, not only are the /. summaries merely paragraphs copied from the article nowadays, they're paragraphs copied incorrectly.
So if my summary was "merely paragraphs copied from the article" then where did I get the 1 in 3,333 and 1 in 4,000 numbers from?
Also, if all I did was copy/paste the article, I'd be plagiarizing and -- not only that -- I would have copy/pasted the correct count of the projects in Rung 3 status. Instead I skimmed the report and was thinking "Rung 3" when I wrote that sentence the three was put in instead of the four. Doesn't make me any less wrong but I hate anonymous non-constructive criticism that's modded up. I apologize for my human error, obviously the human editor also missed it. Since you're anonymous, I can't assume you're human and beg you to relate to my plight of errors. I'm sure my error made the summary completely unreadable. I'm also certain that you've published hundreds of articles on Slashdot without so much as a single error in any of them.
You do know that the number of submissions I've had recently, almost all have had some flaw or error in them. Simply because I realize there's no reward for fact checking. And there's no penalty for getting an error published. So assuming the summary sells to eyeballs and there's no error large enough to get it rejected the next thing is timing. I've written submissions that have been beat out by a few minutes and I get marked "dupe" by firehose. So that pushes me from taking 10-15 minutes to create a summary to 2-3 minutes. Oh well, the worse penalty is if I respond to the article (like this) I'm modded down by righteous moderators. Doesn't really bother me.
If the editors aren't catching the errors and I've got no incentive to reduce the errors, do you think they're going to go away?
Re: (Score:1, Insightful)
You have an excuse. Mistakes happen.
Mistakes like this is why we have editors. The post you replied to was somewhat out of line, though as a general rule I'd say they would have been more accurate than they were in this case. Most submissions ARE copied directly from TFA.
The real issue is that this was a blatantly obvious, easy-to-catch mistake. We're not talking about to/too or their/there issues that a technically-oriented person may not pick up on at first glance; we're talking about something that t
Re: (Score:3, Insightful)
Re: (Score:1)
Honestly here everyone. It's a human error and its bound to happen. Hell why even waste time bitching just read on, use your brain to figure out that the 3 was an oversight and spend all the time you do bitching on a more productive task.
And I know we are all anal retentive nitpicks but by FAR /. has clearer more intelligible writing then any other news or new summary site.
I find typos, not just grammar errors, in almost ever major new story I read. Give the volunteers a break and go chew on someone that's
Re: (Score:1)
Honestly here everyone. It's a human error and its bound to happen. Hell why even waste time bitching just read on, use your brain to figure out that the 3 was an oversight and spend all the time you do bitching on a more productive task.
It is fair sport to pick nits with spelling, etc. It becomes tedious and unsporting when the nitpicking erupts into a flame war.
Re: (Score:1)
It's true but static analysis can fix this problem.
Re: (Score:2)
"Coverity Report Finds OSS Bug Density Down Since 2006"
Bad news for entomologists, huh?
Re: (Score:1, Redundant)
1. tor
2. OpenPAM
3. Ruby
Fewer but bigger (Score:1, Interesting)
Why would Samba and Linux have got so unstable over the years, then?
Re: (Score:1, Offtopic)
Maybe because grammar is tough for windows users?
Re: (Score:2)
Wait, Samba and Linux are unstable now? That's news to me. I can't remember the last time either crashed for me ever.
Re: (Score:2)
Possibilities are endless, just a few here:
1. Fixing bugs found by Coverity might give false sense of "goodness", especially as:
2. Coverity does not catch all problems, e.g. timing or parallelism related. Dual cores are now abundant.
3. A lot of hardware is flaky.
Three? (Score:1)
Hmmm...
In all seriousness, this seems to point to an increasing level of sophistication and maturity in OSS products and procedures, which can only be a good thing.
Re: (Score:1)
What was NASA's 'bug guideline'? I remember seeing or reading it somewhere, I thought it was one bug in 10.000 lines of code.
I could be absolutely wrong! But I just like to know...
Re: (Score:2)
And what is their actual bug count?
Re: (Score:2)
Oblig reference (Score:5, Funny)
Our chief weapon is surprise...surprise and fear...fear and surprise....
Our two weapons are fear and surprise... and ruthless efficiency....
Our three weapons are fear, surprise, and ruthless efficiency...
and an almost fanatical devotion to the Pope....
Our four... no...
Amongst our weapons... Amongst our weaponry...
are such elements as fear, surprise...
I'll come in again.
Wonder when MS, IBM and others will publish? (Score:5, Interesting)
The question of course is "Is 4000 good, average or bad?" can't be answered because closed source companies just aren't going to publish this sort of information.
So what we can say is that the quality of OSS is trending upwards, but we can't say whether this makes it better, equivalent or worse than close source competitors.
What are the odds on any of them taking up the challenge?
Re: (Score:2, Informative)
Actually the topic is the subject of research and the blog below quotes some book that says Microsoft is at 1/2000 lines of code.
http://amartester.blogspot.com/2007/04/bugs-per-lines-of-code.html
Of course, these studies try to assess the number of defects that have not been found yet... So the numbers are to take with a grain of salt, but apparently testing the software before delivery gets 90% of the bugs.
The Coverity report is likely based on what the tool says, so you need a grain of salt for that too.
Th
Re: (Score:1)
Actually the topic is the subject of research and the blog below quotes some book that says Microsoft is at 1/2000 lines of code.
If some blog quotes some book that makes some claim about Microsoft being worse than Linux, that's good enough evidence for me!
Re: (Score:3, Interesting)
There can be some serious "methodology" problems in many of the definitions of "bugs", that can seriously confuse the bug counters.
An example that I like to use is a project I worked on in the late 1990s. An important part of the package that I delivered included a directory of several hundred C source files, mostly small, with at least one bug in each. The project's leaders got some chuckles out of mentioning this at meetings, commenting that they had no intention of letting me fix any of the bugs, since
Re: (Score:2)
In other words, the variation in both categories is so big (more than a factor 10!) that one can not say either side is better with statistical relevance.
Re: (Score:1, Funny)
Actually, we did test our code here at Microsoft, we have 4200 defects by line of code, which is much better than the 4000 of open-source projects.
wait a second...
Re: (Score:1)
The question of course is "Is 4000 good, average or bad?" can't be answered because closed source companies just aren't going to publish this sort of information.
This is part of the reason that OSS is better than closed-source competitors - the bugs are widely-known, and therefore can be more readily fixed.
This is also part of the reason that the quality of OSS is trending upwards.
Re: (Score:1)
Re: (Score:2)
They probably wouldn't make a good representative sample, but you could take the source code of projects that were formerly closed and subsequently opened to see how many errors they averaged. The ID engines come immediately to mind.
Bug Density Down? (Score:1)
... or less effective bug checking?
1 per 4000?! (Score:2)
Re: (Score:2)
It is not as much that 1 line out of 4000 is average for each programmer. It is just that they fix the bugs before release.
Re: (Score:2)
They also decrease the bugs-per-line count in their coding standards. That's why you see lots of blank lines in the code, lines that contain just a single brace, etc. The more lines you can spread your code over, the fewer bug you have per line.
If you don't like this observation, you shouldn't be measuring bugs-per-line. But nearly every company does just that.
There was also the funny thing a few years ago, when MS was claiming that some percent of linux code was stolen from Windows. Someone did a gre
Re: (Score:2)
I don't know.
If someone uses retval for the return value they must be stealing my code.
Re: (Score:2)
Then there's the infamous case of the AT&T /bin/true program, which was a shell script that contained nothing but a blank line and a copyright notice. So if you include blank lines in your code, you're violating AT&T's copyright.
I had fun once (around 1990) by "publishing" the entire text of one of these on a newsgroup, and publicly challenging AT&T's lawyers to take me to court over this blatant copyright violation. For some unexplained reason, I never heard from them.
(If you google for "/bin
Re: (Score:2)
Remember the bug finding is automated. There are only some classes of bugs that can be automatically found.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Posted Anonymous Coward for obvious reasons.
Why? It's not obvious. The sequence of bits stored with your UID known as "karma" might take a hit because this is offtopic? Oh no. End of the world. Do you think your "karma" here matters for anything in the real world?
Or maybe you're really Ballmer and don't want to get caught here.
So explain, because it's not obvious at all.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Totally OT in an OT thread, but your user name makes me lol every time I see a post from you.
Re:Umm yeah (Score:4, Insightful)
If they check 1 line of code every second it would take 133,101.85 years to check 11.5 billion lines of code. At 1000 lines of code every second you are looking at 133.10 years to check that much code. At 4000 lines of code every second (e.g. 4GHz) you are looking at 33.2 years to check that much code.
And if they were only using one system to do this, I'd imagine that would be a problem. I wonder, though, if you spread the processing across, oh, say, 512 processors, if you could get that time down under a month...
Re: (Score:2)
11.5 billion lines of code, with one bug in every 4,000 lets say that's the top line number that their software kicks out almost 3 million bugs. So are there 3 million bugs in all of open source?
At my last posting we used FxCop to analyze our VB.Net software and it found 3-4,000 errors, now what the tool looks for is wide ranging but all but two of these findings were Variable Naming conventions and best practices developed by the .Net development team.
The 2 we fixed said Variables did not have Strong typin
Re: (Score:2)
So looking at the article, but not the study itself, people are submitting their projects to Coverity for static analysis, and 11,200 defects have been eliminated with the ehlp of the program.
Article also says 60 million unique lines of code were scanned on a recurring basis from 280 projects.
Re: (Score:2)
How do you design an algorithm to detect bugs? If there is such an algorithm, why isn't included in compilers so that they can produce bug free code?
It seems to me that in order to determine whether something is a bug, you have to know what the programmers intent was. That requires intelligence.
Re: (Score:2)
http://scan.coverity.com/faq.html#static [coverity.com]
Some examples of the defects include:
* leaked resources
* references to pointers that could be NULL
* references to pointers that are guaranteed to be NULL
* use of uninitialized data
* array overruns
* unsafe use of signed values
* use of resources that have been freed
It wo
Re: (Score:3, Informative)
Isn't 4000 lines/code a second 4 kHz, not GHz, if we're using Hz to measure the frequency of line-processing?
Re:Umm yeah (Score:5, Informative)
At 4000 lines of code every second (e.g. 4GHz) you are looking at 33.2 years to check that much code.
GHz = 1 billion cycles per second. You're only about 6 orders of magnitude off.
Re: (Score:2)
Re:Umm yeah (Score:5, Insightful)
A: We know they didn't check the code by hand.
Of course not, do you know what static code analysis [wikipedia.org] is? I repeatedly said that in the summary.
B: The methodology didn't classify defects (cosmetic, seucrity, minor, major. etc.)
From the report, which is linked to in the article and you obviously didn't care to read before criticizing:
NULL Pointer Deference
Resource Leak
Unintentional Ignored Expressions
Use Before Test (NULL)
Use After Free
Buffer Overflow (statically allocated)
Unsafe Use of Returned NULL
Uninitialized Values Read
Unsafe Use of Returned Negative
Type and Allocation Size Mismatch
Buffer Overflow (dynamically allocated)
Use Before Test (negative)
They then go on to discuss Function Length and Complexity Metrics.
C: The numbers aren't normalized nor broken by application size.
I don't understand how this is statistically relevant. The summary I gave lists by static code defect per line of code and looks at function length. Of course a project with 4 million lines of code would have more defects than one of 4 thousand lines of the code. The lines of code is the normalization!
D: The use of a bug reporting database needs to be measured in regards to a baseline filing\fix % not a total volume (as we need to correlate new lines of code being added)
Does it make any difference to the end user whether 90% of the project is new lines of code or 9% of the project is new lines of code?
It reads like something from the Onion.
You didn't read the report so you can't really speak.
Dear Lord journalism is dead...
Says the poster who didn't read or understand the report.
Detection Accuracy? (Score:2)
Survivorship bias (Score:5, Interesting)
Survivorship bias
http://en.wikipedia.org/wiki/Survivorship_bias [wikipedia.org]
The projects that were alive back then, and now, are obviously more mature, thus would have fewer bugs. Unless you believe in spontaneous generation of bugs at a constant rate in unchanged code (in my experience, actually not too unbelievable for old C++ compiled by the newest G++ due to specification drift)
Re: (Score:1, Informative)
Old projects doesn't necessarily mean old code. Currently, on average each day the linux kernel adds 13K lines, deletes 5K lines, and changes 2.8K lines. Over a year, that works out to roughly 4.5M lines, 2M lines, and 1M lines.
For a project with roughly 12M lines of code, that's a pretty significant amount of churn.
Usability (Score:1)
Fixing issues improves code... (Score:2)
From the press release [coverity.com]:
While this is good for open source and demonstrates the value of static analysis, it is not surprising that if you fix the issues found, the number of issues remaining will go down.
Re: (Score:2, Informative)
If you fix the issues, Coverity moves the project to a new rung and performs stricter analysis to find more types of errors.
1/4000LOC but WHERE? (Score:1)
The bug-per-line count doesn't really give you a reasonable measure of product stability. A bug in the hotspot code is far likely to be triggered by the end user than one in the rest of the software is.
So why not correlate the bug distribution with profiling data? I don't think this should be too difficult. You don't even need to do the profiling yourselves; when you obtain the source code just ask for the data from developers.
Teaching to the test (Score:2)
I love Coverity. I love other static analysis tools too -- I'm one of the lead developers for Perl::Critic, which performs static analysis on Perl code. They are enormously valuable tools.
However, I've seen many cases where people read the issue report from the tool and fix the symptom rather than the problem. The improvement from 1 in 3333 to 1 in 4000 is fantastic, but that means 1 *Coverity issue* in 4000, not 1 *bug* in 4000 lines.
My current closed source project has a Coverity count of 2 issues in 1