Forgot your password?
typodupeerror
Programming Operating Systems Software Sun Microsystems Windows IT Technology BSD Linux

Code Quality In Open and Closed Source Kernels 252

Posted by kdawson
from the tale-of-four-kernels dept.
Diomidis Spinellis writes "Earlier today I presented at the 30th International Conference on Software Engineering a research paper comparing the code quality of Linux, Windows (its research kernel distribution), OpenSolaris, and FreeBSD. For the comparison I parsed multiple configurations of these systems (more than ten million lines) and stored the results in four databases, where I could run SQL queries on them. This amounted to 8GB of data, 160 million records. (I've made the databases and the SQL queries available online.) The areas I examined were file organization, code structure, code style, preprocessing, and data organization. To my surprise there was no clear winner or loser, but there were interesting differences in specific areas. As the summary concludes: '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.'"
This discussion has been archived. No new comments can be posted.

Code Quality In Open and Closed Source Kernels

Comments Filter:
  • Is it just me? (Score:5, Interesting)

    by Abreu (173023) on Friday May 16, 2008 @10:55AM (#23434796)
    Or the summary is completely incomprehensible?

    Of course, I could try to RTFA, but hey, this is Slashdot, after all...
  • Not that surprising (Score:5, Interesting)

    by abigor (540274) on Friday May 16, 2008 @11:05AM (#23434958)
    Final line in the paper: "Therefore, the most we can read from the overall balance of marks is that open source development approaches do not produce software of markedly higher quality than proprietary software development."

    Interesting, but not shocking for those who have worked with disciplined commercial teams. I wonder what the results would be in less critical areas than the kernel, say certain types of applications.
  • Re:Is it just me? (Score:4, Interesting)

    by Bazman (4849) on Friday May 16, 2008 @11:12AM (#23435076) Journal
    Well, it's not just you, but probably millions like you. Plenty of the summary is comprehensible, but I get the fear that it's really just a slashvertisement for his book (third link in summary).

  • The 99% Solution (Score:5, Interesting)

    by SuperKendall (25149) on Friday May 16, 2008 @11:21AM (#23435232)
    So while looking at the data collected, I had to wonder if some of the conclusions reached were not something of a matter of weighting - I saw some things pretty troubling about the WRK. Among the top of my list was a 99.8% global function count!!!

    This would explain some things like lower LOC count - after all, if you just have a bunch of global functions there's no need for a lot of API wrapping, you just call away.

    I do hate to lean on LOC as any kind of metric but - even besides that, the far lower count of Windows made me wonder how much there, is there. Is the Windows kernel so much tighter or is it just doing less? That one metric would seem to make further conclusions hard to reach since it's such a different style.

    Also, on a side note I would say another conclusion you could reach is that open source would tend to be more readable, with the WRK having a 33.30% adherence to code style and the others being 77-83%. That meshes with my experience working on corporate code, where over time coding styles change on more of a whim whereas in an open source project, it's more important to keep a common look to the code for maintainability. (That's important for corporate code too - it's just that there's usually no-one assigned to care about that).

  • by Junior J. Junior III (192702) on Friday May 16, 2008 @11:22AM (#23435250) Homepage

    in plain English: "the app specs had a much bigger influence when compared to internal efficiencies".

    It sounds more like they're saying "If someone built it, and someone else is using it, and it's important, then the code quality is going to be pretty good. If it matters, it's going to get attention and be improved."

    Of course, I can think of a bunch of counter-examples in Windows where something was important *to me* and mattered *to me* and no one at Microsoft saw fit to do anything about it for decades.

  • Re:Closed Source? (Score:5, Interesting)

    by zeromorph (1009305) on Friday May 16, 2008 @11:33AM (#23435470)

    The WRK is under the Microsoft Windows Research Kernel Source Code License [microsoft.com]. I'm not sure that this license conforms with anyones definition of open source, but it's reasonably free for reasearch.

    But PP addresses a crucial point, if something really is closed source there is no reviewable way to compare and present this code. So if the WRK would be total crap they could always say: yes that's only the WRK, not the real kernel.

    Only statements about open source code are directly verifiable/falsifiable. One of the reasons, why the FOSS approach is superior from a scientific as well as technical point of view.

  • So.... (Score:3, Interesting)

    by jellomizer (103300) on Friday May 16, 2008 @11:38AM (#23435574)
    The way you choose to license your software doesn't coralate with software quality... Seems logical to me. As how you license your software has very little to do about the code inside the OS.

    Closed Source Developer: I will try to do my best job as I possibly can so I can keep my job and make money because that is what I value.

    Open Source Developer: I will try to do my best job as I possibly can so I can help the comunity and feel better about myself/get myself noticed in the comunity/Something cool to put on my resume... because that is what I value.

    People who choose to license their software OpenSource vs. Closed Source says nothing about their programming ability. There are a bunch of really crappy GNU projects out there as well as a bunch of crappy closed source projects... Yea there is the argument of millions of eyes fixing problems but really when you get millions of people looking at the same thing you will get good and bad ideas so the more good ideas you get the more bad ideas you get and the more people involved the harder it gets to weed out good ones and bad ones. Closed source is effected often by a narrow level of control where bad ideas can be mandated.... All in all everything really ballances out and the effects of the license are negledgeable.
  • by SixDimensionalArray (604334) on Friday May 16, 2008 @11:43AM (#23435680)
    I haven't seen anybody else comment on the fact that the statement that the quality of the code had more to do with the engineering than the process through which the code was developed is quite interesting.

    From my personal experiences, it typically seems code is written to solve a specific need. Said another way, in the pursuit of solving a given problem, whatever engineering is required to solve the problem must be accomplished - if existing solutions to problems can be recognized, they can be used (for example, Gang of Four/GOF patterns), otherwise, the problem must have a new solution engineered.

    Seeing as how there are teams successfully developing projects (with both good, and bad code quality) using traditional OO/UML modeling, the software development life-cycle, capability maturity model, scrum, agile, XP/pair programming, and a myriad of other methods, it would seem to be that what the author is saying is, it didn't necessarily matter which method was used, it was how the solution was actually built (the.. robustness of the engineering) that mattered.

    Further clarification on the difference between engineering and "process" would strengthen this paper.

    I went to a Microsoft user group event some time ago - and the presenter described what they believed the process of development of code quality looked like. They suggested the progression of code quality was something like:
    crap -> slightly less crappy -> decent quality -> elegant code.

    Sometimes, your first solution at a given problem is elegant.. sometimes, it's just crap.

    Anyways, just my two cents. Maybe two cents too many.. ;)

    SixD
  • by abigor (540274) on Friday May 16, 2008 @11:48AM (#23435752)
    Well, not necessarily. Perhaps for certain types of commodity applications, like office suites, but even then, it's tough to say. That's why I was interested in the comparison. Your assertion is certainly not true for games, for example.

    Generally speaking, commercial desktop apps are still way ahead of their open counterparts, with the exception of code development tools and anything that directly implements a standard (browsers, mail clients, etc.)

    One reason for this is that code quality as measured in this study may not directly relate to application quality as measured by the typical user. Photoshop is "good" not least because of its well-understood interface and the fact that everyone uses it, regardless of how admirable the code is.

  • by jabjoe (1042100) on Friday May 16, 2008 @11:48AM (#23435762)
    "Linux excels in various code structure metrics, but lags in code style. This could be attributed to the work of brilliant motivated programmers who aren't however efficiently managed to pay attention to the details of style. In contrast, the high marks of WRK in code style and low marks in code structure could be attributed to the opposite effect: programmers who are efficiently micro-managed to care about the details of style, but are not given sufficient creative freedom to structure their code in an appropriate manner. "

    How ever I was left wondering how it was possible to compare fairly? He already stated:

    "Excluded from the kernel code are the device drivers, and the plug-and-play, power management, and virtual DOS subsystems. The missing parts explain the large size difference between the WRK and the other three kernels."

    and reading I see even more of the drivers aren't there:

    "The NT Hardware Abstraction Layer, file systems, network stacks, and device drivers are implemented separately from NTOS and loaded into kernel mode as dynamic libraries. Sources for these dynamic components are not included in the WRK. "

    http://www.microsoft.com/resources/sharedsource/licensing/researchkernel.mspx [microsoft.com]

    So it's not like for like. Maybe you would draw different conclusions if it was, maybe the Linux style issue is because of all the drivers the WRK lacks. So even though I think his conclusion sounds probable, I don't feel I can state it as so with any confidence.
  • Stupid metrics (Score:4, Interesting)

    by Animats (122034) on Friday May 16, 2008 @11:54AM (#23435880) Homepage

    The metrics used in this paper are lame. They're things like "number of #define statements outside header files" and such.

    Modern code quality evaluation involves running code through something like Purify, which actually has some understanding of C and its bugs. There are many such tools. [wikipedia.org] This paper is way behind current analysis technology.

  • by mlwmohawk (801821) on Friday May 16, 2008 @11:59AM (#23435968)
    Sorry, I've been in the business for over 25 years and had to hear one pin head after another spout about code quality or productivity. Its all subjective at best.

    The worst looking piece of spaghetti code could have fewer bugs, be more efficient, and be easier to maintain than the most modular object oriented code.

    What is the "real" measure of quality or productivity? Is it LOC? No. Is it overall structure? no. Is it the number of "globals?" maybe not.

    The only real measure of code is the pure and simple darwinian test of survival. If it lasts and works, its good code. If it is constantly being rewritten or is tossed, it is bad code.

    I currently HATE (with a passion) the current interpretation of the bridge design pattern so popular these days. Yea, it means well, but it fails in implementation by making implementation harder and increasing the LOC benchmark. The core idea is correct, but it has been taken to absurd levels.

    I have code that is over 15 years old, almost untouched, and still being used in programs today. Is it pretty? Not always. Is it "object oriented" conceptually, yes, but not necessarily. Think the "fopen,"fread," file operations. Conceptually, the FILE pointer is an object, but it is a pure C convention.

    In summation:
    Code that works -- good.
    Code that does not -- bad.
  • Re:Is it just me? (Score:5, Interesting)

    by Diomidis Spinellis (661697) on Friday May 16, 2008 @12:06PM (#23436068) Homepage

    It's not a very good summary, but the paper is well-written, which is interesting considering that the author is the one who submitted the summary to Slashdot. I suspect that he assumes we have more familiarity with the subject than we actually do.
    In my submission I did not include the last sentence with the "summary", which, I agree, is completely incomprehensible in the form it appears.
  • Re:Is it just me? (Score:4, Interesting)

    by tcopeland (32225) <tom@@@thomasleecopeland...com> on Friday May 16, 2008 @12:07PM (#23436072) Homepage
    > the paper is well-written

    Yup, and the author of the paper is Diomidis Spinellis, who wrote the excellent book Code Reading [spinellis.gr]. This is a great study of code analysis and familiarization techniques. He also wrote a fine article on C preprocessors... in Dr. Dobb's Journal, I think.
  • by Diomidis Spinellis (661697) on Friday May 16, 2008 @12:28PM (#23436462) Homepage
    Coding to achieve some code quality metrics is dangerous, but so is saying that code that works is good. Let me give you two examples of code I've written long time ago, and that still survives on the web.

    This example [ioccc.org] is code that works and also has some nice quality attributes: 96% of the program lines (631 out of the 658) are comment text rendering the program readable and understandable. With the exception of the two include file names (needed for a warning-free compile) the program passes the standard Unix spell checker without any errors.

    This example [ioccc.org] is also code that works, and is quite compact for what it achieves.

    I don't consider any of the two examples quality code. And sprucing bad code with object orientation, design patterns, and a layered architecture will not magically increase its quality. On the other hand, you can often (but now always) recognize bad quality code by looking at figures one can obtain automatically. If the code is full of global variables, gotos, huge functions, copy-pasted elements, meaningless identifier names, and automatically generated template comments, you can be pretty sure that its quality is abysmal.

  • by mlwmohawk (801821) on Friday May 16, 2008 @12:33PM (#23436514)
    Is it good code simply as function of its survival and (sort of) working?

    "sort" of working is not "working."

    exists a 6000 line SQL statement that no one understands

    This is "bad" code because it needs to be fixed and no one can do it.

    Surely you wouldn't define good architecture as "a building that remains standing,"

    I'm pretty sure that is one of the prime criterion for a good building.

    Your post ignores the "works" aspect of the rule. "Works" is subtly different than "functions." "Works" implies more than merely functioning.

  • by raddan (519638) on Friday May 16, 2008 @12:58PM (#23437086)
    But the author also points out that his measure of "quality" is limited. From the paper:

    Other methodological limitations of this study are the small number of (admittedly large and important) systems studied, the language specificity of the employed metrics, and the coverage of only maintainability and portability from the space of all software quality attributes. This last limitation means that the study fails to take into account the large and important set of quality attributes that are typically determined at runtime: functionality, reliability, usability, and efficiency. However, these missing attributes are affected by configuration, tuning, and workload selection. Studying them would introduce additional subjective criteria. The controversy surrounding studies comparing competing operating systems in areas like security or performance demonstrates the difficulty of such approaches.
    From the end-user perspective functionality, reliability, usability, and efficiency are pretty much the entire thing. Most users couldn't care less that a piece of software is hard to maintain as long as it does what he wants reliably, consistently, and with a minimal amount of cognitive load. So this paper is aimed more at applying traditional software engineering metrics to four pieces of real-world software. The outcome *should* show little difference, since all four of these pieces of software are used in mission-critical applications. It would be surprising if one or more of them were not at all in the same ballpark, but it is nonetheless interesting that very different software development styles basically create products with roughly the same mesaures, e.g., modularity.
  • by Khopesh (112447) on Friday May 16, 2008 @01:21PM (#23437580) Homepage Journal

    Isn't NetBSD the system filled with academics who insist upon clean, manageable, and portable code above all other standards? Too bad the NetBSD kernel didn't get judged here, I suspect it would have taken the cake.

    I still recall this exhaustive report [bulk.fefe.de] comparing several kernels' performance back in 2003 in which NetBSD pretty much beat the pants off of everybody else (note the two updates with separate graphs). The initial poor performance was due to an old revision, and upon seeing that there were some places in which the newer revision wasn't so hot, the developers fixed them and in only two weeks, NetBSD beat out FreeBSD on every scalability test. Their pragmatism and insistence on quality code finally paid off.

    Ever since seeing those charts, I've been waiting for Debian/NetBSD [debian.org] to come out...

  • Re:Is it just me? (Score:5, Interesting)

    by utopianfiat (774016) on Friday May 16, 2008 @01:34PM (#23437810) Journal
    Well, it's the second link. At any rate, the highlights of the data are that for the most part the kernels are tied in the important material, except:

    % style conforming lines: FBSD:77.27 LIN:77.96 SOLARIS:84.32 WIN:33.30
    % style conforming typedef identifiers: FBSD:57.1 LIN:59.2 SOLARIS:86.9 WIN:100.0
    % style conforming aggregate tags: FBSD:0.0 LIN:0.0 SOLARIS:20.7 WIN:98.2

    (I'm far too lazy to clean up the rest)

    % of variable declarations with global scope 0.36 0.19 1.02 1.86
    % of variable operands with global scope 3.3 0.5 1.3 2.3
    % of identifiers with wrongly global scope 0.28 0.17 1.51 3.53
  • Re:Closed Source? (Score:3, Interesting)

    by X0563511 (793323) on Friday May 16, 2008 @01:40PM (#23437912) Homepage Journal
    That works the other way too... the real windows kernel could be full of shit, and they would look better for the review of the WRK.
  • Re:So.... (Score:3, Interesting)

    by Diomidis Spinellis (661697) on Friday May 16, 2008 @01:52PM (#23438102) Homepage
    The way you license code can't directly affect its quality, but the way you develop it can. Here are some possible ways in which a company can affect (positively or negatively) the quality of the software:
    • Have managers and an oversight group control quality (+)
    • Through its bureaucracy remove incentives to find creative solutions to quality problems (-)
    • Pay for developers to attend training courses (+)
    • Provide a nice environment free of distractions that allows developers to focus on developing quality software (+)
    • Buy expensive tools that can detect quality problems (+)
    • Developers take their paycheck for granted and loose interest in what they are doing (-)
    • Developers write obfuscated code for job security (-)
    And here are some possible ways in which an open source development effort can affect (positively or negatively) the quality of the software:
    • Volunteers are more motivated than paid employees (+)
    • Nobody takes responsibility for the overall quality of the code; responsibility is diffused (-)
    • Working conditions can be suboptimal (-)
    • Developers work part-time (-)
    • Developers eat their own dog food and therefore care about their code (+)
    • There are many eyeballs to spot code problems (+)
    • There are no marketing pressures to deliver substandard work (+)
    • Developers are geographically dispersed and can't communicate easily (-)
    Both lists can be expanded, and many of the arguments can be refuted. Still you get the idea: the inputs to the two development processes differ substantially and this could affect quality.
  • Simplistic approach (Score:1, Interesting)

    by Anonymous Coward on Friday May 16, 2008 @01:52PM (#23438112)
    Most of the criterions used in this paper are syntactic and/or not clearly related with the actual quality of the code.

    E.g., measuring the average length of the files or the average number of files in a directory reflects an underlying assumption: software quality is better when granularity is higher. But it is not true.

    The reality is that you need an *appropriate* level of granularity in your software, and this also depends on your overall design. It's a little bit nonsense to try measuring such properties of the code as if they were somehow absolute and context-free !

    Such criterions and the related underlying assumptions seem very debatable: thus its no wonder that no significant difference can be extracted out of these measurements.
  • Re:Is it just me? (Score:4, Interesting)

    by Diomidis Spinellis (661697) on Friday May 16, 2008 @02:31PM (#23438668) Homepage
    The preprocessor algorithm I described in the Dr. Dobb's article [ddj.com] is the one I used for parsing the code of this study. A strange preprocessor construct in the Linux kernel caused the macro-expansion algorithm I used previously to fail.
  • by Diomidis Spinellis (661697) on Friday May 16, 2008 @02:36PM (#23438752) Homepage
    I don't think that my results can support us in making arguments regarding 'slightly' higher quality, or 'exactly the same quality'. My figures are based on possibly interdependent, unweighted, and unvalidated metrics. Therefore they only allow us to make conclusions involving large differences.
  • by ThePhilips (752041) on Friday May 16, 2008 @03:14PM (#23439276) Homepage Journal

    The high marks of Solaris and WRK in preprocessing could also be attributed to programming discipline. The problems from the use of the preprocessor are well-known, but its allure is seductive. It is often tempting to use the preprocessor in order to create elaborate domain-specific programming constructs. It is also often easy to fix a portability problem by means of conditional compilation directives. However, both approaches can be problematic in the long run, and we can hypothesize that in an organization like Sun or Microsoft programmers are discouraged from relying on the preprocessor.

    That subjective conclusion is precise effect reading too much into the metrics.

    Sun or Microsoft programmers need to support 2 and 2 platforms respectively. (Sun: SPARC and AMD64; M$: IA32 and AMD64). All portability are of boolean complexity.

    But FreeBSD and Linux run on dozen of platforms. I do not know how it is in BSD land, but in Linux first and foremost requirement for platform support, is that it has no negative side-effects on other platforms. Consequently, for example, under Linux most (all? - all!!) locking is still implemented as macros: on uni-processor system with preemptive kernel feature disabled all in-kernel synchronization would miraculously (thanks to preprocessor) disappear from the whole code base. To make sure that on such platform, kernel would run as efficiently as possible - without any locking overhead, because all the locking is not needed anymore.

    And that's single example. There are many macros for special CPU features: depending on platform it would be nop or asm statement or function call. No way around using macros.

    I think one of the points the author needed to factor in, is portability of OS. Without that, most metrics are skewed too much.

    P.S. Actually, Linux affinity to macros is often (at least from words of kernel developers) stems from poor optimization of inlined functions in GCC. Many macros can be converted to functions - but that would damage overall level of performance. In many places significantly.

  • Re:question (Score:4, Interesting)

    by Diomidis Spinellis (661697) on Friday May 16, 2008 @05:23PM (#23440852) Homepage

    what was the most foul comment you encountered :D ? and where did it reside
    Decency laws in various parts of the world, do not allow me to answer this question. However, I can say that in total the four kernels contain in C files 18389 comments marked XXX. The most famous Unix comment is of course the well-known "You are not expected to understand this". See dmr's [bell-labs.com] page for more details. This [google.com] is also an interesting comment, especially considering the current troubles of the person who wrote it.
  • by mlwmohawk (801821) on Friday May 16, 2008 @08:44PM (#23442404)
    You can say that if the code is functional, reliable, usable, efficient, maintainable, and portable, then it is of high quality.

    Not to put too fine a point on it, but this is too much concern over stuff that does not always matter.

    I agree "functional" and "reliable" are absolutely important.

    "efficient?" Only if efficiency is required or of any concern. How efficient is efficient? It is a balance of efficiency against economy.

    "Maintainable?" Sure, most of the time, but not always. Sometimes we toss stuff on purpose. Some good code can be written that is near to impossible to maintain. Some code sacrifices maintainability for performance.

    "Portable?" There is little sense in making some code portable. PIC software, for instance, can not be good "PIC" code and portable because the PIC is such an insane type of device.

    I think your cute little chart is all well and good for a common segment of software development, but it is hardly an absolute and puts too much emphasis on an arbitrary set of criteria that don't reflect the purpose or economy of software code.

Life would be so much easier if we could just look at the source code. -- Dave Olson

Working...