Undocumented Open Source Code On the Rise 94
ruphus13 writes "According to security company Palamida, the use of open source code is growing rapidly within businesses. However, the lack of documentation and understanding of how the code works can increase the vulnerability and security risks the companies face. OStatic quotes Theresa Bui-Friday saying, 'In 2007, Palamida's Services team audited between 300M to 500M lines of code for F500 to venture-backed companies, across multiple industries. Of the code we reviewed, Palamida found that applications written within the last five years contain 50% or more open source code, by a line of code count. Of that 50% of open source code, 70% was undocumented. This is up from 30% in 2006.' How can businesses protect themselves and still draw on open source code effectively?"
Comment removed (Score:3, Insightful)
Re:Not just for security (Score:5, Informative)
That said, the "70%, up from 30%" numbers are absurd. There is no way that the failure rate to document use of open source code more than doubled in 2007.
Re: (Score:2, Offtopic)
Re:Not just for security (Score:5, Interesting)
I would be interested to know what languages you have used.
I have found Perl to be very well documented, even though it appears to be on a decline or leveled off on the number of developers and active projects.
Meanwhile, I have looked into use Rails and found it a great example of shitty code practices. I've stated this very case to the development community and they pretty much debunked my statements as one belonging to an inexperienced developer unwilling to "go the distance".
I hope this might be slightly helpful in getting people like the Rails community to either understand that they really do need documentation or get companies to throw aside Rails as POS software that is so lacking in documentation that it's a greater burden to have it than to use the alternatives.
There is an excellent case where if you have a highly experienced and knowledgeable developer then you maybe don't care. But if you have to replace this developer with one less knowledgeable or want to expand your development team, you suffer a huge start up cost of trying to bring someone up to speed at your expense.
Specifically, the Rails plug-ins are documented with over simplified tutorials that aren't even available for free and so you have to make an extra effort to find the documentation for the software that you download since they aren't in the same location. Restful Authentication is one example in particular.
Add to that the documentation in Ruby DBI. There isn't any. The documentation says to see Perl DBI for documentation. Considering this is a reference to a different language with different syntax and some of the Perl methods aren't possible in Ruby and likewise Ruby DBI has methods that aren't available in Perl. WTF? This is documentation.
Rails attracts shitty code (Score:2)
As someone using Ruby and Rails on a regular basis, I have to say that this is my experience too.
I got involved in a flamewar with developers of a well-known Rails application over my contention that documentation was required. Their position is that you don't need any documentation if you have unit tests; you can simply read the unit tests to work out what the supported API is.
Riiiiight.
Re: (Score:2)
Re: (Score:2)
Do you think you could document your claim that perl use has "declined or leveled off"?
Perl hype has leveled off, now that O'Rielly is focussing
on selling books about other things, but perl usage remains
pretty high, as far as I can tell.
I do agree with you about the high quality of perl
documentation. Perl has always attracted people who like
to write about their code, and one advantage of having
a reputation as a "write-once" language is that people
bend over backwards to document (both internally and externa
Re: (Score:2)
I absolutely cannot document any decline. I do think the hype has leveled off. But I was in a meeting at YAPC::NA two years ago where they were discussing the decline in new programmers coming into the community.
Re: (Score:2)
I remember that YAPC discussion. As I recall it, the point was that the average age of Perl developers was increasing, indicating a decline in younger programmers entering the community. Perl is rarely a programmer's first language, so this isn't entirely surprising. PHP is taking Perl's place as the newcomer's first web programming language (which is OK in my opinion -- PHP is easier to learn)
Re: (Score:2)
The decline of Perl is a myth. A graph of CPAN uploads vs. time shows a dramatic increase in the last couple of years, and 2008 is already ahead of the entirety of 2007.
http://blog.timbunce.org/2008/03/08/perl-myths/ [timbunce.org]
Re:Not just for security (Score:4, Interesting)
My example would be Activant Eclipse (formally Intuit) - the ERP software the company I work for uses. It's expensive, performs poorly (even on the expensive IBM mini that is required), is buggy, is largely undocumented, is hard/expensive to customize and is completely required to keep our business running. We pay for an expensive support contract and support is still often a joke - complicated issues are usually figured out by our staff because their support staff isn't able to do much. A horrible painful experience.
Source code is its own documentation (Score:5, Insightful)
The only reason why we don't see an article "Undocumented Commercial Software On the Rise" is because the public cannot see how badly documented the commercial software is.
Re: (Score:2, Informative)
I disagree, I tried changing some stuff in the rTorrent source code and noticed that sometimes the only comments/documentation to be found was the GPL notice at the beginning of each file, I never did manage to make the changes I wanted (but I got kind of half-way there at least).
/Mikael
Re:Source code is its own documentation (Score:5, Insightful)
This isn't about closed vs open source, this is about decent programming.
Comments in code are neccessary and a minimal requirement for any project.
At least add one line to any function explaining what the function does, what its input is and what it returns.
This isn't so hard and it won't kill you, but it'll make life easier for you and anyone else who will have to deal with the code later.
It also makes finding errors easier, as your code may not be doing what your specifications say it should do.
I don't understand this hatred for comments and the "code-is-its-own-documentation"-philosophy. I really don't.
<code>
#include <iostream>
#include <algorithm>
#include <iterator>
#define ch_ty(ty) std::istream_iterator<ty>::char_type
#define tr_ty(ty) std::istream_iterator<ty>::traits_type
#define cin_iter(ty) std::istream_iterator<ty, ch_ty(ty), tr_ty(ty)>( std::cin )
#define void_iter(ty) std::istream_iterator<ty, ch_ty(ty), tr_ty(ty)>()
int main( int argc, char *argv[] ) {
while ( (cin_iter(size_t)) != void_iter(size_t)
? ( std::cin.unget(),
argc += *cin_iter(size_t)
) : (
printf( "\nsum: %d\n", --argc ), system("exit")
) );
}
</code>
Perhaps easy to understand, but one comment-line would save you minutes wasted understanding and reading it.
or
<code>
#include <stdio.h>
int v,i,j,k,l,s,a[99];main(){for(scanf("%d",&s);*a-s;v=a[j*=v]-a[i],k=i<
s,j+=(v=j<s&&(!k&&!!printf(2+"\n\n%c"-(!l<<!j)," #Q"[l^v?(l^j)&1:2])&&++
l||a[i]<s&&v&&v-i+j&&v+i-j))&&!(l%=s),v||(i==j?a[i+=k]=0:++a[i])>=s*k&&
++a[--i]);printf("\n\n");}
</code>
Well, obviously obfuscated, but one comment and it's immediately clear what it does.
Re: (Score:2)
It also makes finding errors easier, as your code may not be doing what your specifications say it should do.
In this day and age, if the code has some (suitably formatted) comments regarding what it is supposed to do, you can even get some pretty useful tools to automatically check the code against the specification. It's not bullet-proof, but it can catch a lot of subtle bugs that might get overlooked, particularly in subtely incorrect {use of|calls to} code that you didn't necessarily write yourself. See ESC/Java2 [kind.ucd.ie] or Spec# [microsoft.com] to see what I mean.
Re:Source code is its own documentation (Score:5, Funny)
This isn't about closed vs open source, this is about decent programming.
Comments in code are neccessary and a minimal requirement for any project.
template<typename InputIterator>
typename iterator_traits<InputIterator>::value_type
sum(InputIterator begin, const InputIterator& end)
enough?
Re: (Score:1)
As you said "At least add one line to any function explaining what the function does, what its input is and what it returns." If 30% of your source is directly commented, you've probably done more than enough.
These numbers are probably wrong though.. they say it's
Bad example. (Score:2)
Your examples try to make a point, but miss the mark. The point of the article is not about understanding what the code in question does, it's about maintaining the code - there is a big difference between understanding *what* code does vs. *how* the code does it. A one line comment before a block of unreadable code doesn't help to debug the code. A one line comment in front of a block of semi-readable code at least helps a little.
The real problem is undebuggable and unreviewable code presenting security ri
Re: (Score:2, Insightful)
Recently, we bought a $middleware-thingy$ needed for a specific client installation..
Cost: 2000$.
Documentation and support: zero (apparently the original coder had left [without documenting anything], but the company keeps on selling licenses for something they have no idea of how works).
Re:Source code is its own documentation (Score:5, Insightful)
I'm not for either open source or proprietary code, my employer pays me money to produce code, what he does with it is his business, but what I do have, is experience using both proprietary code and open source code - both models have pros and cons.
With proprietary code there are someone I can call and they are by contract obliged to fix problems within a certain time frame. One particular instance is a database we are paying license fees for, I will not name them but to this date I have found more than 10 vectors that causes crashes. Those problems have been addressed by the vendor in a timely manner (I have yet to find bugs that would be show stoppers, but some did require annoying workarounds). With OSS we don't have this possibility, yes, we can log a bug in whatever bug tracker they use and hope someone will address our issue, but we have no guarantee - also in my experience logging a bug with OSS developers can be quite a daunting process, people can have some serious egocentric issues, while this of course is also applicable for proprietary software, there are someone higher in the food chain who can be called.
With OSS we of course got the good fortune of being able to go through the source code and try to fix the code ourselves... right?
Have you ever even considered just how bloody huge the code base is for something like a database? Tracking down a bug, well yes, the gdb can tell you where the program stopped working, but unless you have some really really good code reading skills and are up to date on everything that happens algorithm wise you have close to zero chance of fixing anything without causing major problems.
Also as a developer I got enough to do creating my own applications, I simply do not have the time to dig through thousands of lines of code every time something new breaks. Yes open source is nice, small projects are easy to help get along, fixing small bugs, but at some point the project grows so big that anyone using it needs to have someone they can call at 4 am in the morning to help them.
Oh and just because some software is proprietary it doesn't mean you don't have access to the source code, even at Microsoft you can buy access to the source.
We got builds with debug flags from the database vendor because we cannot share our database with them, therefore stack traces etc. has to be generated locally and shipped to them. (yes this is a bit annoying, but having sensitive records out in the wild is a tad more problematic).
I don't pick OSS over proprietary or visa versa, I pick what ever tool fits my needs.
Re:Source code is its own documentation (Score:5, Informative)
You seem to be suggesting that the only way open-source can be safe or useful is if everyone evaluates every line of code they use. That's silly, of course. Open source can be safe and useful as long as enough people evaluate enough of the code. And given the number of random patches (some good, some bad) that the Debian project alone receives on a daily basis, I can assure you that a lot of people our there are reading a lot of code.
Of course, I don't personally need to evaluate every line of code in a project as long as I know (and I do) that there are others out there like me who at least do spot inspections. A little pro-active inspection up-front to give yourself at least a basic idea of how the code works can save a lot of grief further on down the line. I count it time well spent.
Re: (Score:2)
Re: (Score:2)
At no point do I talk about safety, thats the OSS mantra, if you doubt its secure you can always read the source. If YOU think thats what I'm addressing you are just reinforcing my point.
Re: (Score:3, Insightful)
I keep hearing people pro open source code say "I can check it!" Well can you? Have you done so - in a project spanning more than a few thousand lines of code?
I've checked a few lines here and there that interest me, other people check the lines that interest them. An awful lot of stuff gets checked.
With proprietary code there are someone I can call and they are by contract obliged to fix problems within a certain time frame.
But that doesn't mean they will and it doesn't mean you get to sue them, and if you do win in court, it doesn't mean that your business is still viable. This is the big proprietary fallacy "There is someone to blame", you can blame all you want, meanwhile you're on the dole because you /can't/ get stuck in, fix it, and carry on your business. With Open Source you ca
Re: (Score:1)
At least an executable is well documented (Score:2)
Re: (Score:2)
Re: (Score:1)
Avoid projects with one developer (Score:3, Insightful)
The original article is an ad for a service that looks at code for you. But it's a real problem.
A basic problem with open source is that once you get beyond the top 50 or so projects, the quality is usually crap. Look at the source from a few random projects on SourceForge. There aren't that many real "community" projects, where multiple programmers are working on the same code. The long tail isn't very good.
Re: (Score:2)
Re: (Score:3, Informative)
You have a point, but s/the top 50/the top 1000 or so/. You have to count various C libraries, and things like the Perl modules at CPAN. Many of them are in wide use, and should be trustworthy.
70% Undocumented, huh? (Score:5, Insightful)
Re: (Score:1)
Re: (Score:2)
Of that 50% of open source code, 70% was undocumented
When it told you up front that they were doing line counts, "70% was undocumented" tells me that a little under every three lines has a comment. If you ask me, that sounds like an awful lot of documentation.
Re: (Score:3, Informative)
The article is FUD (Score:2)
with total lines of code, the number is possibly plausible.
How often do you comment stuff like
{
}
i++;
Re: (Score:2)
which can really be helpful if the amount of code in brackets is somewhat large. It might not always be obvious what "i" is to a support programmer not familiar with that portion of the codebase.
I've worked on code that was written 30 years in the past (no, I'm not kidding -- FIELDATA FORTRAN V is still in use here as well as at one of my former places of
Re: (Score:1, Informative)
Re:70% Undocumented, huh? (Score:4, Informative)
Statistics ... (Score:3, Insightful)
They talked about looking at 300m LOC. I'd hope 70% was "undocumented". 70% of most code is just common-everyday stuff that doesn't NEED to be documented in the sense that comments are completely wasteful. It's the "glue code" that needs to be documented, and the non-intuitive stuff, and stuff that is done for a reason that, on first glance, looks like the writer had a brain fart, but, in this special case, makes sense, or "corner case" situations.
Do *NOT* "insert comments like "for (i=0; i
Re:Statistics ... (Score:4, Interesting)
> NEED to be documented in the sense that comments are completely wasteful.
So true! Rather than this code:
# Finds the most recent orders for the passed in person
def get_rec(p)
# blah blah
end
I'd much rather see an intention-revealing method name (hat tip Marcel Molina [vernix.org]):
def find_recent_orders_for(person)
# blah blah
end
I'm still not really sure what documentation is really useful - maybe a few diagrams plus some use case descriptions that go through the code, maybe? I'm not sure. I guess it depends on the project - it is a widely used library? Is it an internal department app to track the coffee fund? etc.
My experience with open source code has been that the large projects have decent docs... I was just reading through some of the PostgreSQL docs on backups [postgresql.org] this weekend and they're quite good.
Re: (Score:2)
I recently created a lexer and parser for our database so we could generate maps of how the database objects interacts - we got 300+ procedures calling 70+ tables, having comments tells you what each of them do, but how stuff interacts is a life saver.
Consider:
someddl.sql
--Table for user login
create table login(
login char(64) not null primary key,
password char(32) not null -- applications serve us md5 sum
);
someproc.sql
--Procedure to check login
CREATE PROCEDURE doLogin (username varchar, password
Re: (Score:2)
> I can generate complete call graphs over our system
That's very cool! Do you do data flow analysis as well?
Re: (Score:2)
Re: (Score:2)
Cool, sure, no harm done there!
Re: (Score:2)
Funny, I'd much rather have a simple function name to type, with comments easily accessible, then have to retype them all the time. Keep comments write once/read many. As opposed to:
Orders* FindRecentOrdersForCustomerOrReturn -Avoiding Lameness Filter- INVALID_HANDLE_VALUEForAnInvalidCustomerOrNULLForNoOrders ( const Person& Customer )....
Trust (Score:2)
Re: (Score:2)
Re: (Score:2)
Too many times there's a divergence between the docs (and even the embedded comments) and the code, because of a "the code is important - the documentation can wait" mindset. People will claim to understand that good documentation, kept in sync, will save money, but their actions betray them - "this is an exception - we can do
Re: (Score:2)
Higher-level stuff is great. Unfortunately, when you see something like someone complaining that "70% of the code wasn't documented", without any further explanation, you've go to wonder just how the did their analysis - was it like the "MIT Dumpster Divers" that SCO claimed to have hired, but who must have been hiding in Blep's suitcase along with the "millions of lines of code".
Same old, same old. (Score:5, Insightful)
This has NOTHING to do with "multi-national sites" or any of that.
This has EVERYTHING to do with clearly stating the rules and ENFORCING those rules.
The rules do not enforce themselves. Someone, somewhere has to approve the code that goes in.
The problem is that management does NOT understand code and will happily farm out the work to anyone who says that they can produce X lines for $Y. Without oversight. The less oversight, the less expensive the project is. Which means bigger bonuses for those same executives.
I notice an omission (Score:4, Insightful)
They talk about how much of the open-source code is undocumented. I notice that they don't bother to mention how much of the in-house code is also undocumented. My experience as a software engineer is that their in-house code's probably at least as poorly documented as the open-source stuff. And if the business finds this state of affairs acceptable for their in-house code, why's it any more of a problem for the open-source parts?
I've also found that when the business does get a consultant in who demands documentation, they usually demand something that's completely useless for the actual developers. Eg., they demand UML models for all the software. Well, that's nice and all, but most of what's in the UML you can see by glancing at the class definitions. The things a developer needs, like what the methods are supposed to do and what gotcha caused a particular way of doing it to be picked and what assumptions the code's making about it's inputs and outputs, have no place in a UML model.
Re: (Score:3, Informative)
Re: (Score:2)
Then I suspect the consultants here are making a big deal out of very little. Where I work we use a lot of open-source in our programs, and there's nothing in any of the official documentation about what we're using. All there is is a notation that these programs use outside libraries, and of course the makefiles and such list every such library used. And over in a completely separate team's workspace is the complete set of external libraries we use, along with their dependencies and build instructions. I'm
Comments are overrated (Score:2, Insightful)
Documentation should be used sparingly and as tightly woven into the development process as possible. The programmer should document their code when
Kind of makes the underhanded code contest (Score:3, Interesting)
Re: (Score:2)
So - 1 underhanded line out of 200 LOC takes skill to do. Imagine what you could do with 1 million LOC.
Documenting is kind of hard. (Score:3, Insightful)
After really sitting down with some programs, I realized I just had no idea where to start. There was certainly more to be said than who made the program and what license it was under like many programs have in their 'help' and 'about' menu, but it really does get to be an enormous task and it's a certain amount of responsibility because the few people that will read the documentation first will take everything it says to heart.
I might try again, but I'm going to be sure I really have time to do it and the patience to read through source code. mangu is right, even though I don't know how to program but it's not hard to figure some things out and sometimes there's vital comments 'between the lines'.
I have noticed more programs (included in Ubuntu) have the information I need when I care to look at it now, I generally check documentation for command line arguments and stuff in case --help won't tell me everything or anything at all. At least someone's getting the job done.
Re: (Score:2)
Send it Back (Score:2)
Let me complain a bit too (Score:1)
When going further into the code I would expect
So Much for RTFM (Score:1)
buddy: "Did you RTFM?"
me: "I can't. there isn't one."
Re: (Score:2)
Meh, I'll save y'all reading all of this (Score:4, Interesting)
Again, open source is not any more risky than any other kind of code. What is risky is not documenting your use of 3rd party code...here are some quick examples: OpenSSL, phpBB, xt:commerce
So what they're basically saying is that if you use OSS tools in your company, someone should probably be keeping track of them and patching them as needed.
Should this not hold for *all* software you've deployed? Few programs are immune to eventual obsolescence (including ongoing bugs and security problems), so if you think you're safe just because you're running a bought-and-paid-for solution that you've subsequently ignored, you're probably in trouble.
That being said, I wonder about this:
I get the impression that we're not getting the full story here. If their code audit showed that 50% of software X was copypasta from sourceforge, that would be something (you probably have crappy developers plus possible legal hell if there were copyright infractions).
On the other hand, if they figured "hey, your hello world program uses library Y, which is 2 million lines that we don't think is documented properly," then the "application" does not *contain* 50% or more open source code, but rather *references* a certain amount of open source code, which is probably a meaningless statistic.
Re: (Score:3, Informative)
Its not that the 2 million lines of code is undocumented. It might be documented very well. Its that the project doesn't record the fact that code is used from Project OpenThingee. Thus, when OpenThingee finds a pro
Re: (Score:2)
For example: If the OpenThingee source tarball is 75,000 lines of code, and I install it somewhere (in an undocumented way), does that mean I need to write 75,000 lines to get the ratio up to 50%?
Re: (Score:2)
Re: (Score:2)
Ah, that might clear it up, thanks.
Evenso, I'm not sure I think the people in the interview were being a bit sensationalist. Then again, it makes for a fun topic!
News Flash: Code is frequently undocumented (Score:2)
Seriously, this isn't news. I don't care what context you're talking about, programmers often skip over documenting their work. That's largely due to the pressures of how much time they have to work on something (either imposed by The Boss or other time commitments).
Re: (Score:2)
(who actually wrote said code)
Why You Inc should care
You begin writing programs and then selling them. One of your Core Products is say Manager Mind Reader/Parser (this program can be used to figure out what a manager means from what he is saying and a rather nifty General Manager MindMap. Your developer took the code (and data) from a SF project. You then base 85% of your business on this one pro
Re: (Score:2)
Finally competitive (Score:2)
But seriously, the trend needs to stop as it creates an excuse ban open source from the workplace.
Gotta love Slashdot (Score:5, Insightful)
Sarcasm here I come... (Score:1)
It will come fully documented!
The more interesting statistic is... (Score:2)
Re: (Score:1)
Common sense should dictate that any 3rd party code, where used, is notated as such in order to facilitate security related patching if/when needed and to possibly indicate further review as to how it relates with the rest of the code. But then again common sense isn't so common and introducing 3rd party code into another codebase has the potential to open up an entirely different can of worms as to
Nowhere does this article mention unit testing (Score:2)
Well of course... (Score:1)
We don't need no steenkin documentation!
The problem with reading the source (Score:2)