Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

[ Create a new account ]

Code Quality In Open and Closed Source Kernels

Posted by kdawson on Friday May 16, @11:53AM
from the tale-of-four-kernels dept.
Diomidis Spinellis writes "Earlier today I presented at the 30th International Conference on Software Engineering a research paper comparing the code quality of Linux, Windows (its research kernel distribution), OpenSolaris, and FreeBSD. For the comparison I parsed multiple configurations of these systems (more than ten million lines) and stored the results in four databases, where I could run SQL queries on them. This amounted to 8GB of data, 160 million records. (I've made the databases and the SQL queries available online.) The areas I examined were file organization, code structure, code style, preprocessing, and data organization. To my surprise there was no clear winner or loser, but there were interesting differences in specific areas. As the summary concludes: '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.'"

Related Stories

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More | Login | Reply
Loading... please wait.
  • Is it just me? (Score:5, Interesting)

    by Abreu (173023) on Friday May 16, @11:55AM (#23434796)
    Or the summary is completely incomprehensible?

    Of course, I could try to RTFA, but hey, this is Slashdot, after all...
    • Re:Is it just me? (Score:5, Insightful)

      by Anonymous Coward on Friday May 16, @11:58AM (#23434836)
      That is if you can figure out which of the 12 links are the actual FA and which are supporting material.
      • Well, it's the second link. At any rate, the highlights of the data are that for the most part the kernels are tied in the important material, except:

        % style conforming lines: FBSD:77.27 LIN:77.96 SOLARIS:84.32 WIN:33.30
        % style conforming typedef identifiers: FBSD:57.1 LIN:59.2 SOLARIS:86.9 WIN:100.0
        % style conforming aggregate tags: FBSD:0.0 LIN:0.0 SOLARIS:20.7 WIN:98.2

        (I'm far too lazy to clean up the rest)

        % of variable declarations with global scope 0.36 0.19 1.02 1.86
        % of variable operands with global scope 3.3 0.5 1.3 2.3
        % of identifiers with wrongly global scope 0.28 0.17 1.51 3.53
    • Re:Is it just me? (Score:4, Interesting)

      by Bazman (4849) on Friday May 16, @12:12PM (#23435076) Journal
      Well, it's not just you, but probably millions like you. Plenty of the summary is comprehensible, but I get the fear that it's really just a slashvertisement for his book (third link in summary).

    • Re:Is it just me? (Score:5, Insightful)

      by raddan (519638) on Friday May 16, @12:18PM (#23435184)
      It's not a very good summary, but the paper is well-written, which is interesting considering that the author is the one who submitted the summary to Slashdot. I suspect that he assumes we have more familiarity with the subject than we actually do.
    • Re:Is it just me? (Score:5, Insightful)

      by Diomidis Spinellis (661697) on Friday May 16, @12:50PM (#23435794) Homepage
      I didn't write the last part when I submitted the story, and, yes, the summary given here is comprehensible, because it appears out of context. What the sentence '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.' means is that when you build something complex and demanding, say a dam or an operating system kernel, the end result will have a specific level of quality, no matter how you build it. For this reason the differences in the software built with a tightly-controlled proprietary software process and that built using an open-source process are not that big.
      • Re:Is it just me? (Score:5, Insightful)

        by legutierr (1199887) on Friday May 16, @01:37PM (#23436570)
        How useful is it to write something about computers that needs to be translated for the slashdot audience? Jargon is a great way to provide specialized information to insiders quickly and efficiently, but this is slashdot. If slashdot readers need for you to restate your description of a problem or observation related to the Linux kernel (even if that description is taken out of context), could it be that the paper could be written in a more open manner? The quote you provided from your paper seems to speak to a narrow audience; how narrow must your audience be, however, if it excludes a good portion of slashdot's readers?

        If I seem overly critical, I do not mean to, it is only that I hate to see good, useful research made less accessible to non-academics by the use of academic language.
  • by Number6.2 (71553) * on Friday May 16, @12:03PM (#23434922) Homepage Journal
    That you have neither capitalized on your shared synergies, nor have you recovered your cherished paradigms.

    Oh. Wait. This is about propeller-head stuff rather than management stuff. Lemme get my "Handbook of postmodern buzz words"...
  • Not that surprising (Score:5, Interesting)

    by abigor (540274) on Friday May 16, @12:05PM (#23434958)
    Final line in the paper: "Therefore, the most we can read from the overall balance of marks is that open source development approaches do not produce software of markedly higher quality than proprietary software development."

    Interesting, but not shocking for those who have worked with disciplined commercial teams. I wonder what the results would be in less critical areas than the kernel, say certain types of applications.
    • by ivan256 (17499) on Friday May 16, @12:25PM (#23435320)
      It's obvious what the results would be.

      Half completed, unpolished commercial software usually stays unreleased and safe from this sort of scrutiny. However many of the same types of projects get left out in the open and easily visible to everybody when developed as open source. The low code quality of these projects would drag down the average for open source projects as a whole.

      On the lighter side, you could say that you'd only consider software that was "out of beta" or version 1.0 or greater, but that would leave out most open source projects and commercial "Web 2.0" products....
  • by wellingtonsteve (892855) on Friday May 16, @12:05PM (#23434962)
    ..that Open Source code is of quality, but at least the point of things like the GPL is that you have the power to change that, and improve that code..
  • CScout Compilation (Score:5, Insightful)

    by allenw (33234) on Friday May 16, @12:08PM (#23435028) Journal
    "The OpenSolaris kernel was a welcomed surprise: it was the only body of source code that did not require any extensions to CScout in order to compile."

    Given that the Solaris kernel has been compiled by two very different compilers (Sun Studio, of course, and gcc), it isn't that surprising. Because of the compiler issues, it is likely the most ANSI compliant of the bunch.
  • If I am understanding correctly, you were looking for 'winners' and 'losers' (weasel words in and of themselves, but anyway...) in terms of 'quality' (another semi-subjective term that could make someone go crazy and drive motorcycles across the country for the rest of their lives).

    You found that '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.' -- or in plain English: "the app specs had a much bigger influence when compared to internal efficiencies".

    I would wonder if you're just seeing a statistical wash-out. Are you dealing with data sets (tens of millions of lines and thousands of functions) that are so large, that patterns simply get washed out in the analysis?

    Oh dear, my post is no more clear than the summary...
  • The 99% Solution (Score:5, Interesting)

    by SuperKendall (25149) on Friday May 16, @12:21PM (#23435232)
    So while looking at the data collected, I had to wonder if some of the conclusions reached were not something of a matter of weighting - I saw some things pretty troubling about the WRK. Among the top of my list was a 99.8% global function count!!!

    This would explain some things like lower LOC count - after all, if you just have a bunch of global functions there's no need for a lot of API wrapping, you just call away.

    I do hate to lean on LOC as any kind of metric but - even besides that, the far lower count of Windows made me wonder how much there, is there. Is the Windows kernel so much tighter or is it just doing less? That one metric would seem to make further conclusions hard to reach since it's such a different style.

    Also, on a side note I would say another conclusion you could reach is that open source would tend to be more readable, with the WRK having a 33.30% adherence to code style and the others being 77-83%. That meshes with my experience working on corporate code, where over time coding styles change on more of a whim whereas in an open source project, it's more important to keep a common look to the code for maintainability. (That's important for corporate code too - it's just that there's usually no-one assigned to care about that).

    • Re:The 99% Solution (Score:4, Informative)

      by Diomidis Spinellis (661697) on Friday May 16, @01:03PM (#23436024) Homepage

      So while looking at the data collected, I had to wonder if some of the conclusions reached were not something of a matter of weighting - I saw some things pretty troubling about the WRK. Among the top of my list was a 99.8% global function count!!!
      I guess Microsoft uses a non-C linker-specific mechanism to isolate their functions, for instance by linking their code into modules. But yes, this is a troubling number.

      This would explain some things like lower LOC count - after all, if you just have a bunch of global functions there's no need for a lot of API wrapping, you just call away.
      The lower LOC comes from the fact that WRK is s subset of Windows. It does not include device drivers, and the plug-and-play, power management, and virtual DOS subsystems.

      Also, on a side note I would say another conclusion you could reach is that open source would tend to be more readable, with the WRK having a 33.30% adherence to code style and the others being 77-83%. That meshes with my experience working on corporate code, where over time coding styles change on more of a whim whereas in an open source project, it's more important to keep a common look to the code for maintainability. (That's important for corporate code too - it's just that there's usually no-one assigned to care about that).
      About 15 years ago I chanced upon code in a device driver that Microsoft distributed with something like a DDK that had comments written in Spanish. The situation in WRK is markedly better, but keep in mind that Microsoft distributes WRK for research and teaching.
  • KLOCs? (Score:5, Insightful)

    by Baavgai (598847) on Friday May 16, @12:28PM (#23435380) Homepage
    If good code and bad code were a simple automated analysis away, don't you think everyone would be doing it? What methodolgy could possibly give a quantitative weighting for "quality"?

    "To my surprise there was no clear winner or loser..." Not really a surprise at all, actually.

  • by SixDimensionalArray (604334) on Friday May 16, @12:43PM (#23435680)
    I haven't seen anybody else comment on the fact that the statement that the quality of the code had more to do with the engineering than the process through which the code was developed is quite interesting.

    From my personal experiences, it typically seems code is written to solve a specific need. Said another way, in the pursuit of solving a given problem, whatever engineering is required to solve the problem must be accomplished - if existing solutions to problems can be recognized, they can be used (for example, Gang of Four/GOF patterns), otherwise, the problem must have a new solution engineered.

    Seeing as how there are teams successfully developing projects (with both good, and bad code quality) using traditional OO/UML modeling, the software development life-cycle, capability maturity model, scrum, agile, XP/pair programming, and a myriad of other methods, it would seem to be that what the author is saying is, it didn't necessarily matter which method was used, it was how the solution was actually built (the.. robustness of the engineering) that mattered.

    Further clarification on the difference between engineering and "process" would strengthen this paper.

    I went to a Microsoft user group event some time ago - and the presenter described what they believed the process of development of code quality looked like. They suggested the progression of code quality was something like:
    crap -> slightly less crappy -> decent quality -> elegant code.

    Sometimes, your first solution at a given problem is elegant.. sometimes, it's just crap.

    Anyways, just my two cents. Maybe two cents too many.. ;)

    SixD
  • Stupid metrics (Score:4, Interesting)

    by Animats (122034) on Friday May 16, @12:54PM (#23435880) Homepage

    The metrics used in this paper are lame. They're things like "number of #define statements outside header files" and such.

    Modern code quality evaluation involves running code through something like Purify, which actually has some understanding of C and its bugs. There are many such tools. [wikipedia.org] This paper is way behind current analysis technology.

  • by mlwmohawk (801821) on Friday May 16, @12:59PM (#23435968)
    Sorry, I've been in the business for over 25 years and had to hear one pin head after another spout about code quality or productivity. Its all subjective at best.

    The worst looking piece of spaghetti code could have fewer bugs, be more efficient, and be easier to maintain than the most modular object oriented code.

    What is the "real" measure of quality or productivity? Is it LOC? No. Is it overall structure? no. Is it the number of "globals?" maybe not.

    The only real measure of code is the pure and simple darwinian test of survival. If it lasts and works, its good code. If it is constantly being rewritten or is tossed, it is bad code.

    I currently HATE (with a passion) the current interpretation of the bridge design pattern so popular these days. Yea, it means well, but it fails in implementation by making implementation harder and increasing the LOC benchmark. The core idea is correct, but it has been taken to absurd levels.

    I have code that is over 15 years old, almost untouched, and still being used in programs today. Is it pretty? Not always. Is it "object oriented" conceptually, yes, but not necessarily. Think the "fopen,"fread," file operations. Conceptually, the FILE pointer is an object, but it is a pure C convention.

    In summation:
    Code that works -- good.
    Code that does not -- bad.
    • Well, you lose your bet, been over five minutes and no anti Microsoft screeds let alone spelling it with a $.

      Just so everyone understands, the tactic used here is known as "Poisoning the well." [wikipedia.org] The idea is the discredit an argument's source before the argument is presented. Here, our AC friend is trying to ward off criticism of Microsoft by insinuating that anyone who does so is a 13 year old "Slashbot."

      The fallacy is in the fact that even someone who is 13 and often goes along with the Slashdot zeitgeist may still have legitimate criticisms of Microsoft, such as the fact that Microsoft sucks giant hairy donkey balls.
    • Re:Closed Source? (Score:5, Interesting)

      by zeromorph (1009305) on Friday May 16, @12:33PM (#23435470)

      The WRK is under the Microsoft Windows Research Kernel Source Code License [microsoft.com]. I'm not sure that this license conforms with anyones definition of open source, but it's reasonably free for reasearch.

      But PP addresses a crucial point, if something really is closed source there is no reviewable way to compare and present this code. So if the WRK would be total crap they could always say: yes that's only the WRK, not the real kernel.

      Only statements about open source code are directly verifiable/falsifiable. One of the reasons, why the FOSS approach is superior from a scientific as well as technical point of view.