Open Source Code Maintainability Analyzed 264
gManZboy writes "Four computer scientists have done a formal analysis of five Open Source software projects to determine how being "Open Source" contributes to or inhibits source code maintainability. While they admit further research is needed, they conclude that open source is no magic bullet on this particular issue, and argue that Open Source software development should strive for even greater code maintainability." From the article: "The disadvantages of OSS development include absence of complete documentation or technical support. Moreover, there is strong evidence that projects with clear and widely accepted specifications, such as operating systems and system applications, are well suited for the OSS development model. However, it is still questionable whether systems like ERP could be developed successfully as OSS projects. "
Re:Extremely Ridiculous Publishing (Score:2, Informative)
You don't want to know what goes into sausage (Score:5, Informative)
I'm sure it's the same with ERP. It's just a huge polished turd, but because you don't have the source code you don't know it's a turd. You only see the polish.
Guess we'll find out. (Score:2, Informative)
GNU | Enterprise [gnuenterprise.org]
No ERP, eh? (Score:1, Informative)
It is funny how sourceforge [sf.net] lists [sourceforge.net] several [sourceforge.net] ERP [sourceforge.net]-systems [sourceforge.net] then [sourceforge.net]. And the list goes on, by the way.
Re:Was this really a surprise? (Score:5, Informative)
I agree, though, that automated test suites are underused. Also, not enough programmers (OSS or otherwise) seem to understand the importance of refactoring.
A message to coders: People, if your function is more than 10 lines long, you should start to consider splitting it. If it's more than 100 lines long, you're probably doing something wrong. If you have the same code written with slight modifications two or more different ways, you're probably doing something wrong. Use templates rather than repeating code if your language supports them. If you ever feel "this should probably be commented more", don't comment it - split it up into functions and let the functions be their own comments (if you have to, comment the functions as well). Use const as much as physically possible (in supporting languages). Use array objects that clean themselves up instead of arrays allocated on-the-fly whenever physically possible. If you find certain variables being used often together, group them into an object. If you find a set of functions operating on an object and only that object, make them member functions. Etc.
Just doing basic refactoring can make code far more organized and readable.
Was in "Look at the Numbers!". Positive results. (Score:4, Informative)
OSS is no silver bullet. Their last point is "OSS code quality seems to suffer from the very same problems that have been observed in CSS projects." Er, big surprise, they're all software.
Re:Sin(Sqrt(comments_in_percent)) ??? (Score:4, Informative)
Now, I'm all for emperical data, but that is just bistromatics and totally insane.
Metrics are already "black magic." This one is no worse or better than any other dimensionless metric I've seen.
Obviously the input is in radians. The argument to a trig function is always assumed to be radians unless otherwise specified. Now, the sqrt(mumble percent) can only range from 0 to 1, so what we're looking at here is the graph of the sin function from 0 to 2.4 radians.
Do it now. Graph it. Graph the function sin(sqrt(2.4*x)) from x=0..1
Notice that this function (you might call it a transfer function) ramps up and peaks at 0.43 radians. That corresponds to a comment percentage of 3%. Then it begins to go down again. What does this mean? It means that there is a point beyond which more comments are not useful. If more than 3% of your code is comments, there's something wrong. That's all that part of the equation means!
You only classify it as "bistromatics" because you're too lazy to do the thinking and figure out what it's for.
Re:Corporate OSS is an Ad-hoc Corporate Alliance (Score:3, Informative)
Not quite. Most enterprise software comes with source available, and pretty much all of it gets customized once you get into bigger customers. Its actually a real PITA when it comes time to do upgrades. And yes, I'm an architect at an ERP vendor.
Re:maintainability index = bullshit (Score:3, Informative)
The etymology of the word is irrelevant. In practice, people use the term "percentage" to mean parts-in-100 OR a fraction. Look at the second definition listed in the dictionary. It's a "part of a whole." I've used it both ways many times. So has every engineer I've ever worked with. It's usually obvious from the context, as it was in this case.
Well, what you illustrate again is that the MI is a seat-of-the-pants kind of measure that was thrown together because it looks nice, not because anybody thought about statistical validity.
Do you even know what the term "index" MEANS? It's a magical number used to quantify the unquantifiable! It's dimensionless! What, do you think the Consumer Price Index is any less black magic? Nobody ever implied that the number has any statistical validity, it is just a number which happens to do a fairly good job at helping people compare things.
If it's useful to you, use it. If not, don't.
Re:Results would be fairer (Score:3, Informative)
Re:Was this really a surprise? (Score:3, Informative)
> the inner loop behavior
No, function pointers are evil. Pointers themselves can make things rather nasty, although they are important in some situations. Use inheritance and virtual inline functions. Inline functions run as fast as macros, and inheritance is a lot cleaner than using function pointers.
> It is a hand-unrolled implementation of a bit-oriented run-length encoder
I'd like to see a piece of code that by hand-unrolling runs 7 times faster and can't be functionalized. I sure haven't seen one yet, and can't picture how that would even be possible.
> away from typical ideas of "maintainability" for the sake of speed
Seldom is that the case nowadays in modern languages. You sound like you're writing in pure C, though
Re:How many projects have died for maintainability (Score:4, Informative)
We can't use Javascript, it's loosely typed!
We have to use an Object Broker, SQL is not maintainable!
All the projects that I have been on where code maintainability has been the primary goal have one thing in common. They all failed.
If that is their idea of "maintainable", they didn't fail because they shot for maintainable, they failed because they drank the kool-aide and trapped themselves into software paradigms that only work when oodles of resources are thrown at them. Smaller teams require more agile methods to get results, and that is also the mechanism whereby smaller teams can produce software where larger teams failed. (It goes both ways, I'm not claiming that as an absolute. But that small teams can and have beaten much bigger ones is an unassailable fact.)
Certainly you've got some good facts at hand to learn from, but I think you're taking the wrong lesson away. Projects that simply ignore maintainability fail, too. Can you imagine Mozilla with no concern for maintainability, or the Linux kernel?
The focus should always be on product quality, not code quality.
If you don't have quality code, you don't have a quality product. You may have an adequate product. You may be in a situation where an adequate product is all you need. I have an adequate set of knives in my kitchen, because I can not afford quality knives. But I do not pretend that they are therefore quality knives.
You're calling for a classic short-term focus, and you can and will suffer the classic penalties for short-term focus. I know, I've seen it first hand and dragged software products out of their local optima by the sweat of my brow. It's not easy, but either it happens or the product dies a code-quality death.
You need to use the proper metric for quality. Inappropriately using and paying for a strong type system is anti-quality in my book; that goes for your other two examples as well, when done correctly. (SQL and JSP code both need to be rationally minimized via the application of Once and Only Once, but they are not the cause of the unmaintainability; the abandonment of Once and Only Once is. Once and Only Once is one of the most important aspects of any proper quality metric.) Your quality metric should have functionality built into it.
Re:You don't want to know what goes into sausage (Score:3, Informative)
Its often a case of fix one bug create 2 more. We've got customers who refuse to upgrade because they are worried about losing data, running into strange bugs that didn't exist in previous versions etc.
I think a lot of stems from the fact that developers of this stuff tend to focus on putting new features in the program rather than stabilizing or documenting it.
Urban legend alert! (Score:3, Informative)
So I've been told, sometimes by some of the biggest names in programming. Unfortunately, a firm belief among the industry doesn't make them right.
Rather than debunking this one here, I'll simply refer you to Steve McConnell's excellent Code Complete. McConnell cites a large amount of hard data to show that longer routines can be at least as good on both development time and error count grounds as shorter routines, and indeed exceptionally short routines (the 10-liners you're advocating) are amongst the worst on both metrics.
As an aside for general interest, since I'm sure a lot of people reading this comment also found that book very good, it seems a second edition has recently been published, updating the examples by a decade or so and putting much more emphasis on recent coding approaches, particularly OO. Whether that is an improvement remains to be seen, but I'll certainly be buying a copy. I guess if he's reversed his position based on more recent studies, I'll have to eat my words, too, but I doubt it. ;-)