Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Statistics On Free Software projects 93

GenericBoy writes: "The first edition of The Orbiten Free Software Survey is out online. Some of the stats are number of authors and projects, the top 10 contributing authors, how many MB are in all of the free software projects put together (!) and a bunch more. " Now, as they themselves point out in the their Scope and Method, the methodology is crude, and I don't think Orbiten could quite submit it to Nature yet or anything, but it's an interesting bunch of stats.
This discussion has been archived. No new comments can be posted.

Statistics On Free Software projects

Comments Filter:
  • by Anonymous Coward
    Is it fair to mention 'Gordon Matzigkeit' at all? He only appears in the list because his name appears in hundreds of acinclude.m4's. This nicely proves that the statistics is completely nonsense.
  • No, it turns out Gordon is only human after all. To quote from this post on Advagato [advogato.org]

    Well, if you recognize Gordon's name, you'll remember what project he is perhaps best known for: libtool. Now, packages that use libtool happen to include some rather long (autogenerated) files in them that have Gordon's name attached. So for every package that uses libtool, Gordon gets credited with about 8 thousand lines of code. What a sweet deal!

  • by Anonymous Coward
    Rock & Troll: a form of music best played from a Beowulf cluster.

    Troller Derby: a skating game in which everyone skates around screaming, "First Score" even if they are the 10th.

    Cinnamon Trolls: tasty flavored grits poured down one's pants.

    Troll Call: all participants stand in a line and appeal for Natalie Portman's nubile body.

    On a Troll: when some loudmouth who cannot read an actual article does nothing but disparage slashdot submissions incessantly.

    Con-Troll: a miscreant poster who just escaped prison.

    Dave Troll: leader of the band called the Foobar Fighters.

    Bridge Troll: offtopic poster interested in card games.

    Pet-Troll: (1) impudent poster used as fuel in the UK; (2) a troll belonging to another, as a pet.

    Trolley: conveyance used to transport numerous trolls in San Francisco.

    Trollkin: the family of a troll.

    Trollop: a female poster of ill repute.

    Trollanthropy: the rare act of a wiseass poster giving someone or something its due.
  • Is it just me, or have these stats left out some fairly large projects suck as Jakarta [apache.org] and Mozilla [mozilla.org]?

    Admittedly I didn't look through everything, but I don't see Jakarta mentioned under the apache author page, nor do I see mozilla under jwz or Netscape's author pages. Am I blind, or are they? :)

    And if they did miss these two, (Mozilla alone is a somewhat massive sum of source code) what else are they leaving out?
  • "Lies, damn lies, and autogenerated reports." -- Peter Baylies, 5/9/00

    Or, if you don't believe me, just remember that
    "united states government as represented by the" is responsible for 305,338 lines of code, 200k in the Linux Kernel, 100k in OSKit, and 10% of the Linux Surfboard Driver. Go, US!

    ...and bow down and worship Gordon Matzigkeit. One day, every child in America will be able to spell his last name, and recognize him as the unsung hero of the free software revolution...
    ---
    pb Reply or e-mail; don't vaguely moderate [ncsu.edu].
  • Man, go away. Posting the results here is just not right. I could see it being helpful
    if, say, the site was ./'ed, but it isn't.


    Wooohoo! My server survived its first slashdotting. Without any particular preparation either (I didn't notice it had made slashdot till a friend told me), and while running all my nice eye-candy too. Kudos to apache...

    Adrian.
  • I think "copyresponsibility" would be better than "copywrong". Who really cares, though?
  • I noticed on the PostgreSQL Hackers list that Thomas Lane said this was very bogus because it appears to re-include his libjpeg as many times as it is used by something else.

    Yep. I came to the same conclusion. The authors of the survey do a brute force analysis and count whatever name shows up.

    So if you manage to show up on some file that gets included in a lot of projects, like the C/C++ libraries, you will score very high. That is what put Ulrich Drepper on number 8.

    On the contrary I was not able to spot a lot of hard working folks from the BSD crowd. So the authors of the survey did not scan through a FreeBSD, OpenBSD or NetBSD tree. Even giants, like Donald E. Knuth (DEK) did not show up. So TeX was not included either.

    What to think of it?
    The basic idea is nice, the equivalent of a Open Source top ten. It could appeal to the same people who try to score high on distributed.net or Seti. (But especially these projects had people show up who increased their scores bei illegal methods)

    I however like the idea to, in a few years on from now, to be able to look up on what stuff I worked. But guess this will need a much improved system.

    My conclusion is these guys had the right idea, that the existing body of free code screams to be analyzed. So let's forget that they did it poor, and let's try to improve things.

    At first they should extend their input, an easy way is to scan the contents of the former Walnut Creek ftp server, as it cover a lot of free software. However one would need to add a lot of different servers too. Adding the major free systems, commercial stuff like mozilla, projects from science (there is a lot of free Fortran out too!

    If anyone is interested in setting up a better attempt, please contact me.

  • When I showed this URL to my family, the reaction was "wait a sec! Bottomfeeders? Isn't that a bit derogative?". It took quite some explaining to make it clear that it was the culmination of what I've done over the years: I've joined the hordes of folks who, by submitting small patches, fixes, bits of functionality, have made the difference between making Open Source a hobby of a select few, and making it a (possibly) useful tool.

    Yep. The author credited is usually the person who wrote the first version of a particular file. This neglects the maintainer and the many people who might advance the state with their patches. All of them, plus web masters, documenters, release and source code repository engineers (maybe I forget a couple of important folks too) deserve credit!

    If done properly, patch submitters should be noted in the CVS logs. Some projects (like FreeBSD) route that comments in commit logs too.

    Ergo: scan the cvs trees and not the release packages.

  • by Otter ( 3800 ) on Tuesday May 09, 2000 @11:16AM (#1082787) Journal
    Might I propose that from now on, Slashdot posters saying:

    • Oh, yeah? You have the source. Write it yourself, you moron!
    • QT/GTK is for idiots.
    • Apple is so stupid. If they open-sourced everything we'd fix it for them.
    • M$ code is terrible.
    • Why isn't Company X open-sourcing their product? Proprietary software is evil!
    • Free software project X sucks.
    or such things, be expected to link to this site showing exactly how much they've contributed.

    Although, given that the study has managed to overlook my insignificant but non-zero contributions, maybe I shouldn't propose that.
  • Yeah, I'm on that list! Right at position 771 AND 772!

    What!? They counted me TWICE? Once as tord.jansson@swipnet and then later as tord.jansson... hm... 248447 bytes for each of them... Hm, seems like they somehow counted me twice but with the SAME value or maybe they somehow split it in half.

    Let's click on my name and see what projects they have mentioned me participating in, should be just BladeEnc... What!? makeMP3.codd!!! What the heck is THAT program!? Hm, I see... got to be some kind of frontend that has included the BladeEnc code...

    Feels a bit odd getting credited for a program I don't know anything about, but still kind of okay... :)

    On the other hand, I wonder how they came up with 248447 bytes, the BladeEnc code is about 1.5 meg :-/

    But then again, it wouldn't be fair to credit me for more anyway since BladeEnc is so heavily based on the original ISO code and the other BladeEnc contributors haven't gotten any credits since they're just mentioned on the homepage. :(

    Guess this shows how far from precise this study is. A good attempt to measure something quite
    imessurable though. Kudoz to all the people who must have put down an awfull lot of work on this and hope you could get some usefull out of the big picture although the small details are terribly wrong.

    Tord Jansson
    BladeEnc Creator
  • Money.

    Microsoft has hundreds of full-time programmers on Windows, more than enough to swamp the efforts of 13000 part-time hackers and students. IIRC, Windows, measured in man-hours, is the single greatest engineering project in the history of humanity.

  • I think this would more different if they did the survey on something like debian.
    --
  • A lot of unix coders put the starting brace of while and for loops on the same line. I think windows code generally puts it on the next line.

    I know it is only one line, but a lot of unix code I've seen does "} else {". That's three lines of windows code. It adds up! :)


    -- Thrakkerzog
  • Given the nature of this community, I suspect this is more of a "tip of the iceberg" sample, and has a high error rate. There's alot of projects that helped create (enable for you buzzword people!) more projects - I doubt many people would have gotten in on the free software scene if not for the GNU C Compiler. Comparing authors by quantity instead of quality is a poor way of judging progress. So take this report with a grain of salt - they make no claims of this being comprehensive or telling, and neither should you. Already I see people proclaiming that this is the metric by which contributors should be judged. Sigh.

    Secondly, most of this community, by its very nature, is distributed, decentralized, and hard to account for. That's not a coincidence - many of us like remaining anonymous.. the man behind the scenes. As anecotal(sp?) evidence look at the .sig blocks on slashdot - how many famous people note their OSS accomplishments in their sig? Very few. And as Linus himself said.. it's not like girls are throwing their underwear at him. Many people don't *want* to be counted.. an anonymous patch here and there is sufficient.. "I just want it to work".

    So before people start using this report as a metric of people's contributions, remember two things: Even small contributions count, and this is an inclusive rather than exclusive community - you are welcome here whether you contribute source or not. People who write documentation, help the newbies, and convince management to put their company printers on linux (3Com anyone?) ought to be commended too. There's alot more here than code!

  • by Rich ( 9681 ) on Tuesday May 09, 2000 @11:42AM (#1082793) Homepage
    I checked out the stats for some apps I've written and I found they are way out. For example the analysis of kgui gives me 52.789% of the code despite the fact I am the sole author!

    In general the handling of large packages such as KDE seem fairly poor. For example KDE apparantly has no authors according to the by-project listing. I think this is a great idea, but it needs a cleaner source of data, for example Coolo has been able to give some very interesting and detailed figures by running scripts on the KDE CVS repository. Perhaps this is the sort of thing they need to be using as the initial data set from which they make their analysis.

    Rich.

  • As one of the authors of a similar but more focused report on the developer community [unc.edu], let me point out a few of the problems with this piece of work.
    • pooling of very unlike data - that is mixing apple and oranges of communites in such a way that individual creators of smaller projects are mixed with sophistocated complex projects like Apache and the kernel
    • inconsistant data gathering - as pointed out in other messages, whilst claiming to represent everything a collection of over 4K projects is missing (LSM projects which we looked at)
    • gross analysis of data - that is not trying to understand what data means what as that licenses are mixed with authors
    • more is more fallacy. that is saying that "we counted a lot, so we learned a lot" smart and focused sampling is always better and tells you more
    • gotta read more to tell you more


    on the other hand, the collection of the data -- if it can be arranged in some meaningful manner and then processed in a reasonable way that will yield thoughtful conclusions -- is no small task and rishab and his associates should be applauded for the hard work they did on that portion of the project. i, for one, would be glad to work with them to try to pull out some meaningful reports from their well-meaning but, i think, misfiring project.


    Paul Jones [mailto]

  • But are they including SCSL code in Sun's count?

  • Actually, the Great Wall of China has more in common with Windows than you might think - it didn't work.
  • by BJH ( 11355 )

    Well, of the graphs they provided, the 3D piegraph was definitely superfluous, and everything except the last pie graph could have been done using free software (take a look at gnuplot; it may have been able to do the pie graph as well, I'm not sure), so yes, I'd say he has a point.
  • Losing key staff is no longer the exclusive realm of corporations. I sort of surprises me to see this argument brought up in the context of open software! :-)

    Absolutely! What is more, losing "key staff" in an open-source project is generally much less devistating than it is in a closed-source context, as open-source by its very nature tends to distribute expertise on a given project much more widely.

    For example, early in the Linux Years (pre 1.0) the guy (I forget his name) who did allot of the early networking work abandoned Linux to its own devices, largely due to being flamed for not having written the perfect, most elegant implimentation in his first iteration. Another took over that aspect, the kernel lived on, development moved forward, and Linux is now a raging success. The loss of a very key developer caused hardly a hiccup in development (though an auful lot of discussion, flamage, and doomsday saying).

    kNFS was abandoned for almost a year, which caused myself and others a number of headaches in dealing with Linux NFS (and is probably the reason why Linux NFS lags behind the BSDs and commercial UNIXen in performance). That having been said, it was picked up, is being actively developed, with NFS V 3 support in the 2.4-pre kernels. This is probably the best "worst case" or at least "very bad case" example of an open source project being abandoned one can find, at least in the Linux area of endeavor.

    Abandonment of a project can lead to some delay (as with NFS), but as often as not the delay is minimal (gimp, Linux networking) as another active developer takes over. I would submit that delays in closed-source commercial applications are much more common and typically much more lengthy.

    Finally, with open source the project will always be picked up and continued by someone, as long as there is any interest. Contrast this to many closed-source products which are orphaned, leaving developers and users in a serious bind which they can do nothing about, other than remapping their entire engineering or corporate strategy to a complety new, competing product, at great cost in time and money. In the worst case open-source scenerio, such a customer would have to finance and perform ongoing development and maintenance themselves, which would often be a less expensive solution than the alternatives. Having said that, I do not know of a single open-source project where anyone was compelled to do this. I do know of a number of orphaned, closed-source products which left consumers in a terrible bind, from bitter, personal experience.

    Our solution, which has to date saved us tens of thousands of dollars and hundreds of developer hours in cost, was to move to an open source platform (Linux and FreeBSD) and require open source libraries to be used wherever possible, limiting our exposure to orphanage of closed-source products.
  • check out the enlightenment stats [orbiten.org]

    granted, good ole raster is a huge part of the project, but i was surprised to see him mentioned at least three times ("the rasterman", "carsten haitzler", and "raster@zip.com") as was mandrake...duno if this should be attributed to their data collection methods or to messy credits files (understandable in the case of raster's typing ;P)


    -dk
  • You mean, it would have even more FSF stuff on it, right?
  • by Carl ( 12719 ) on Tuesday May 09, 2000 @11:11AM (#1082801) Homepage
    This was already discussed [advogato.org] on Advogato [advogato.org] yesterday.

    The discussion points out some interesting facts about why some individuals are listed as big contributers (such as the author of libtool. Duh.) and why some aren't listed at all. They even have some comments from the developers of the survey.

    And I just love the comment of Havoc Pennington:

    It shows me as a major contributor to "gnuclear" and nothing else - I don't even know what gnuclear is. ;-)
  • IIRC, Windows, measured in man-hours, is the single greatest engineering project in the history of humanity.

    That or the Apollo space program ... or pick your favorite big project. Get a sense of proportion, please.

  • Only 25 million lines? Only 3,149 open source projects? Where's comp.sources.unix? UNC metalab? Phhpth. Next they promise to include CPAN? Whoopee.

    Hate to say it, but they made their mistake in thinking freshmeat.net was comprehensive. freshmeat.net is a very small part of the open source out there.

    RocketAware [rocketaware.com] already lists much more than freshmeat (and is way easier to use, if you are a programmer looking to reuse code, eh?)

  • Then why are you posting as AC?

    Bowie J. Poag
  • Good to see something like this. However, I have to admit, its a little bit of a letdown. I've got 10MB worth of gear in Red Hat 6.1, but my name didn't show up anywhere. Yes, yes, I know, it's not code, Bowie..Heh


    Bowie J. Poag
  • Business must be pretty slow at VA for you to be able to spend your day trolling on Slashdot. Gives VA's recent stock price, I cant say I'm all that surprised.

    FYI, I wasn't whining, dippy. I just find it interesting that this study ignored non-code based contributions to Linux.

    Go back to work, goon.

    Bowie J. Poag
  • Looks like a resounding victory for the FSF. But respect to Sun, who, despite being a big ole commercial company, still have managed a huge input.

    --Remove SPAM from my address to mail me
  • Random made up statistic to prove point:

    Let's say they looked at 10 million lines of code. Well, 0.139% is 13900 lines of code. Not insignificant.

    Duh.

    --Remove SPAM from my address to mail me
  • Wonderful point - and I hope folks that are in the less than 1% crowd don't quit either! Even finding and fixing one line of code is a blessing.

    Heck, as I sit here now I have found three lines of code I need to put in this program I am writing where I did not clean up my linked list. Argh! No wonder the original app has had a tendency to crash over the past 3 years.

    The small stuff is as big as the big stuff.

  • > So for every package that uses libtool, Gordon gets credited with about 8 thousand lines of code. What a sweet deal!

    If this ever gets as popular as karma whoring, OSS is in for some serious bloat! "Oh yea? Well my tool inserts eight million lines of code, nyeh, nyeh, nyeh!"

    --
  • As suspect as the data is, it would be nice if people were inspired to develop more free software and pay as much attention to their position on this list as they do to their seti@home rank.

    Well, maybe not quite that much attention. We don't need kiddies who wouldn't know C++ from Excel macros checking in millions of lines of garbage into any open CVS.

  • Actually, it's because they didn't run it over enough stuff - Debian potato alone has around 218 million lines of code (compare to slink's 70 million).

    As for number of projects, potato has 4376 packages, not all of those are separate projects (some are from multi-binary source, some are task packages), but I'm rather sure more than 3149 of them are :)

  • Did you have a chance to have a look at the stats for the biggest individual contributor, namely gordon matzigkeit ?

    He succeeded in writing the exact same size of code in numerous projects:

    • 35489 bytes of code in 70 different projects (zzplayer xpdf XCGI qbrew pilot-link outguess lxandria981105 lmemory lletters libjpeg-6b LAPACK_D ky kwintv kwebwatch kvoicecontrol kvoctrain kvncviewer kvideogen ktimeclock ksniff ksnes9x ksnapshot kshow ksendmail kreglo kprima kplot3d kpl kpilot-3.1b9 kpasman kover komba knetstart knetdump knc kmud kmp3info kmp3 kmol kmodem kmap kluach klm KKinit kishido kircpoker kinst khotkeys khealthcare kgui kfstab kfibs keasyisdn keasycd kdiskcat kdict kcmpgp kblinsel kbind kBeroFTPD jukebox3.2-pre6 jpegsrc.v6b harnmaker gsynth gpgp gettext gdbm freetype cgicq arts).
    • 52144 bytes in 32 different projects (no list, you understand the idea).
    • 54697 bytes in 31 different projects
    • 45401 bytes in 29 different projects
  • Interesting stuff I thought at first. Very interesting indeed.

    Then I started to check into details. Being the author and participator of at least five projects listed on freshmeat (all of them included in this "report") I checked them up to see what they had to say about me and the projects I've contributed to.

    They had no clue at all. Lots of people got a lot of code submissions they've for sure never made, while it was very obvious that some of the major authors did not get as large enough amount acredited as they have done to the projects. Many names were very confusing and mixed up.

    Seeing how badly wrong they are on the few projects I have in-depth knowledge about, how can I trust any conclusions they make in general on the whole context?

    I say scrap the whole thing, do it all from the start. This is not the truth.
  • Interesting results, and certainly the numbers involving lines of code per project are probably accurate.

    However, glancing through a project that I'm the primary author on shows me as the 24th on the list of developers for it, having written 585 bytes. I suspect I've written a few more than that.

    The top of the list was dominated by a mailing list address that isn't even correct. The second name on the list was the UCRegents, who owns the copyright (but certainly their lawers didn't write the code).

    And judging by the other comments, I suspect that the majority of their data is similarily way off. I wonder if they even tested the tool they developed on a few randomly selected projects to see how accurate the results were. They didn't even perform the most obvious data collection method I can think of: "cvs annotate".

    I like the study, but I'd sure like to see it done better.
  • by SwissPope ( 33213 ) on Tuesday May 09, 2000 @04:35PM (#1082816)
    I looked at the algorithm [orbiten.org] used to determine how they collected the names of contributors. They grepped e-mail addresses, rcs ids, and copyright info from various files. I don't think that's the best way to draw any useful conclusions in regards to Open Source software. The only real conclusion found here is that Open Source projects include a lot of code written by other people. That's trivial. This study fails to make a distinction between an active contributor and someone whose code was simply borrowed. This is an important distinction to make! For instance, what if I were to take 1000 physics homework assignments and search for "F=ma" in them. I can't assume that the appearance of "F=ma" on your paper means that Newton helped you with your homework. I can only assume that you used Newton's second law of motion to help you solve the problem.

    Similarly, if you wanted to determine who the most prolific scientific researcher is in a field, would you gather data by simply grepping for names in the texts of papers? No, you'll skew the data by counting the names who appear in the paper's "References" when you should just be counting the actual investigators who are listed as the authors of the paper!

    I would like to see this study repeated but making the distinction between an active contributor to a project and someone whose code was simply included. Only then would a top-heavy distribution suggest anything meaningful in regards to OSS authorship.

    If anyone has looked at the CODD algorithms/code and can show me if they used a more sophisicated method to filter out authors with no active involvement in a project, please post. It's a difficult problem to infer who actively and who passively contributed to a project with just a perl script.
  • Well no, actually my wife's name is Heather and my Son's name is Max and it just sort of happened :-)

  • I noticed on the PostgreSQL Hackers list that Thomas Lane said this was very bogus because it appears to re-include his libjpeg as many times as it is used by something else.

    Also, is FSF an Author? Is BSD an Author?

  • Man, go away. Posting the results here is just not right. I could see it being helpful if, say, the site was ./'ed, but it isn't.
  • "alot" is a verb.

    Dictionary.com doesn't even give [dictionary.com] it that much credit. It's an acronym.
  • by El Volio ( 40489 ) on Tuesday May 09, 2000 @11:11AM (#1082821) Homepage
    Yeah, the FSF came out way on top, with Sun and the UCB regents not far behind. OK, but is it really fair to compare them to individuals like Gordon Matzigkeit, et al? I'm not familiar with any of the individuals, but it would seem to me that each of them deserves far more credit.

    OTOH, it's nice to see some sort of a start at studying the free software community...

  • "Windows, measured in man-hours, is the single greatest engineering project in the history of humanity."

    hmmm... I wonder how many man-hours went into the pyramids and the great wall... Any of you engineers wanna venture an estimate on the G.W.? I think the ancient Chinese beat MS hands down. ;)
  • Ten million lines of code?

    Lets say this were Windows NT .. the person who wrote the 13900 lines of code would have written the code to blast you with "You must reboot for changes to take effect" dialog box that pops up whenever you dare move the mouse. A worthy contribution, indeed! :]

  • by dcs ( 42578 )
    I'm completely aghast that they did not include a single OS beyond a Linux distribution. I'm happy to see they'll include OpenBSD in their next study, though I wonder why they chose OpenBSD instead of NetBSD, which is larger. And I wonder why not include FreeBSD too, whose developers base is quite different from that of Open and NetBSD.
  • IIRC, Windows, measured in man-hours, is the single greatest engineering project in the history of humanity.

    It probably depends on your definition of "single". But I reckon the pyramids would beat windows, given that they were done by hand millenia ago.
  • you are a wanker.

    a linux wanker.

    Did the original poster even *mention* Linux? Linux is not the same thing as Open Source.


    it's people like you that prevent open-source software from being adopted for serious purposes because you're constantly advocating it even when it is not a rational choice.

    Free software was not a "rational choice" in 1984, if by rational you mean The Best Tool For The Job. If everyone only cared about using the best toolset, gcc would not have been written and none of this open-source explosion would have happened. Your use of the word "rational" suggests the original poster's view is crazy. Well, remember that this whole shebang has been made possible by a man who is "crazy", in the sense of not always wanting to use the short-term best tool for the job.


    I agree with your point, that the use of Excel does not detract from this study at all. You're also right about misuse of the word "ironic". Please don't misuse the word "rational".

  • by divec ( 48748 ) on Tuesday May 09, 2000 @02:29PM (#1082827) Homepage

    They list their sources as follows:


    • RedHat Linux v6.1 source rpms
    • Linux kernel sources version 2.2.14
    • Munitions cryptography/security archive
    • An un-random half of Freshmeat

    Debian would have been a more sensible distro to use, because it is overflowing with (packages|crap). Red Hat (presumably) just ship the ones which it makes commercial sense to ship, wheras Debian has everything that anyone's bothered to include whether it's useful or not. For example, Cooledit (my favourite text editor) is missing from the survey. The only problem with Debian would be stuff missing because it is not DFSG-free. Such stuff is available in the non-free/ directory but it's probably not as comprehensive as the main/ directory is.


    Having said that, it's very interesting to see what they have got. I didn't know Andrew Tridgell did all that stuff, for example. This could be a good tool for the community to get to know people better.

  • ESR had a colloquiem at Cornell a while ago and I brought up Nikolai Bezroukov's critique of his CatB, which he loudly discredited. I wish this survey would have come up earlier...I would like to ask him to comment on these statements:

    "The top 1271 authors, 10% of the total, accounted for 72.3% of the total code base. The top 10 authors alone (0.08% of the total) are credited for 19.8% of the code base. Free software development may be distributed, but it is most certainly very top heavy."

    "Our conclusion: Free software development is less a bazaar of several developers involved in several
    projects, more a collation of projects developed single mindedly by a large number of authors."

    The question from Bezroukov's paper I didn't bring up was that open source projects look much more cathedralesque and hierarchical as one moves up. E.g., not just anybody gets patches put right in to the Linux or *BSD kernel.
  • by konstant ( 63560 ) on Tuesday May 09, 2000 @11:29AM (#1082829)
    What I find most interesting by far is the composition of the contributions when viewed by project. In nearly every project I viewed, there are two or three elite "key contributors" who provide somthing on the order of 1/3 to 7/10 or more of the code, with the remainder provided in a slew of sub-1% coders.

    This relates an interesting story. It appears that, while the real strength of OSS is incremental improvement over time, few projects can exist without a guiding intellect or a handful of ambitious coders on the core team.

    Presenting this data to employers who are concerned about losing control of their code may help assuage their fears of open source. Clearly projects that are "owned" by no one are rarities. A corporation *can* have its cake and eat it too.

    -konstant
    Yes! We are all individuals! I'm not!
  • Well, there is one way that the OpenSource community can take over and Lead the way over the networking protocols.

    Come up with our own protocol.

    I have had this Idea in my head for a while, but I am only a network support tech, not a programmer, so I couldn't do it myself. I have some great ideas, but no way of implementing them.
  • Now I feel like an a**, I replied to the wrong article! Sorry, my bad.
  • Wonderful point - and I hope folks that are in the less than 1% crowd don't quit either! Even finding and fixing one line of code is a blessing.


    I fully agree. And there is an important point that shouldn't be missed. The top author, FSF, is not only not a single person, but because of copyright assignments, it isn't even really a single organization. The FSF has been a valuable member of the free software community for a long time. In fact, arguably, free software might not exist as a viable force today without it. But that doesn't make the FSF a single contributor.

    I know that there are some files out there with an FSF copyright on them that I wrote. I don't begrudge them the copyright assignment. They have taken the stewardship of the projects that I contributed those files to. For the sub one percent group, of which I am one, don't ever forget that our strength lies in both numbers and diversity. Jon Bentley quoted someone in his Programming Pearls chapter entitled Bumper Sticker Computer Science:

    Each new user discovers a new class of bugs.


    It would be easy enough to expand that to cover all of the relevant things that a new set of eyes bring to a free software project: new hardware configurations, a new language, new data.... But the original quote stands alone quite well.

    To each and every contributor of code, bug reports, feature requests, reviews, documentation, translations, or anything else, I offer my thanks. The most obvious evidence that you are needed is that you made a contribution. You did what no one else did.
  • Funny, but dead on..

    Perhaps we should help them with a more intelligent 'author filter', and a better FM source snagger. It's obvious that Mr. Matzigkeit didn't belong that high up on the list, and other entities like UCB are over represented as well. Most everything *BSD carries the Berkley name, regardless of author!!
  • Everybody hates Powerpoint, except harried meatball mid-level managers.

    "When I'm singing a ballad and a pair of underwear lands on my head, I hate that. It really kills the mood."

  • Although, given that the study has managed to overlook my insignificant but non-zero contributions, maybe I shouldn't propose that.

    Yeah, same here. Not like I do a whole lot, just the occasional patch or bug report. But the fact that I seem to have absolutely nothing implies either something is wrong with their methods or people haven't been crediting me in their changelogs. :)

    Of course once I actually get around to releasing the projects I've been working on I'll have some stuff on there.
  • However, this also shows a weakness of open-source projects - if the major person of the project abandons it for any reason, the project will be stalled at the very least.

    How do you tell a company that the guy maintaining the program they're using just isn't interested in it any more? I guess the solution is more people who are paid and actually have responsibility for the projects (obviously projects under the FSF won't have this kind of problem).

  • by El ( 94934 )
    And what's wrong with using the most conveniently available tool for the job? A rational, non-bigoted person wouldn't see anything wrong with using a tool that you already had available and were experienced in using. I also don't see anybody mentioning any Open Source applications that would have been better suited to the task... does StarOffice have this capability?
  • By that same criterion, I wouldn't call Windows an engineering project either. "Whilst elaborate, the actual engineering would have been fairly minimal." Yep, sounds like Windows to me!
  • by El ( 94934 ) on Tuesday May 09, 2000 @11:36AM (#1082839)
    12706 developers working several years on 3149 projects, and they've still produced fewer lines of code than a single release of Win2K... is this because Open Source is more efficient, less feature-rich, or because it doesn't carry the burden of backwards compatibility with DOS 1.0?
  • While I think that its good someone is performing some stats on open softawre development (if only to show others that stuff is actually being done) I think this could contribute to some BIG problems if people start to compete for the highest ranking.

    There is a good story about IBM in the late 70s about how they measured a researchs labs performance based on KLOCs (100s of lines of code). Suprisingly the lab at Boca was winning most of the time. Then someone figured out that they were unrolling all of their loops in order to increase the line count...

    Proves is can/does happen...
  • so what??? I wouldn't blame them for using it, Excel is a good product. I use good products. I use Linux at home, and I use Excel at work. If it wasn't a good product, I wouldn't use it. If Linux wasn't a good product, how many of us would use it? Personally, I'm a little tired of people bashing products because they're made by MS...bash them for their bugs -- fine -- but not just because they're made by MS.

  • Seen as this survey has highlighted code re-use in the Open Source community, (Gordon Who ?), do you reckon that OSS proves a good model for that Holy Grail, effecient code re-use (libtool et al).
    Is there anybody doing studies on code-reuse on OS sw or closed source sw ?
    McC
  • Actually that quote is variously attributed to Mark Twain or the British Prime Minister Disraeli.

    anyway, the point is that stats can be used to lie, but equally they can be used to extract the truth. For example much of modern materials science is based on statistics. Likewise economic forecasting techniques. Stats aren't always bad, it's just that they can be misused.

  • When watching any sporting event this rings true. I love hearing an ex-sports player turned commentator talk about how "3 out of the last 4 times these teams have met on a full moon during the month of July...."

    Statistics are the tool of the devil.
  • I don't see how they come to their conclusion. They say that because most contributors contribute only to 1 project, OSS development doesn't work as a bazaar. Shouldn't they be looking instead at how many people contribute per project? They would get 4 on average, and obviously the more successful projects will have more contributors.

    Other than that, they are sampling a very small (and non-representative, I would guess) number of projects. There are a hell of a lot more than 3000 projects listed on Freshmeat alone. And god knows how many developers are missed. It's a start, but no more than that.

  • I think it's just that there is probably no other good spreadsheet package around. I tried using StarOffice for writing one of my papers that had a lot of charts in it, but I was just disgusted by how you have to coerce the thing into making even a simple chart. Also embedding of a chart in a text document was just plain buggy and crude. It took me hours to do something that would take minutes in Excel and in the end I had to settle for less than perfect charts. I know that there are other spreadsheets around (like the Corel or Lotus one) but they are also closed source commercial products. Also Excel is probably the best one around, even if it's by Microsoft, etc. I don't blame Star (yet), because I was using a beta, but they are very far from the object embedding that Office does. I liked their equation editor though...
  • The fact that the person who wrote libtool (which is used by just about everything) got the most credit and other bogus stats reported in other posts demonstrates one thing: statistics lie. One of my math profs has this book called "How To Lie With Statistics"--the title says it all.

    And finally, a word from Harry Truman: "There are three types of lies: lies, damn lies, and statistics!"

  • Actually, 0.139% is not much of a contribution to a project, which is what the original poster questioned. Orbiten looked at approximately 25 million lines of code, and 3149 identifiable projects. That's 7939 lines per project on average, and thus 0.139% is only 11 lines of code.

    However, those 11 lines may have been the most important 11 lines in the project!

  • Think of it this way, 11 line of code could be a bug fix. So the 0.139% of a project could be someone fixing a bug. It is a useful contribution.
  • Statistically speaking, someone was bound to say that.
  • Eric Raymond has a pretty neat description of a process for resolving this. He calls it homesteading. The basic idea is: someone abandons a project, someone else thinks its cool, announces his intent to pick it up, and hey presto: there's a new project owner.

    Losing key staff is no longer the exclusive realm of corporations. I sort of surprises me to see this argument brought up in the context of open software! :-)

  • Yeah, the statistics are poor across the board. According to this site, I contributed a total of 5502 bytes of code to Open Source projects. Well, make that Ethereal [zing.org]. I couldn't find Mozilla at all! This is weird, because Mozilla, though not being the most important and by a large margin not the first open source project, is the project that made the term Open Source a household (and boardroom) name.

    I pride myself in contributing lots of humble changes or fixes to lots of projects. Still, I'm not in the business of getting my name in the AUTHORS file of every project under the sun (even though it a nice side effect of a hobby that exploded :-) My motivation is to make my life easier and more fun, while contributing to the public good.

    The most flattering thing that was ever said about my contributions was hidden in the URL of an interview by Feed Magazine [feedmag.com]. When I showed this URL to my family, the reaction was "wait a sec! Bottomfeeders? Isn't that a bit derogative?". It took quite some explaining to make it clear that it was the culmination of what I've done over the years: I've joined the hordes of folks who, by submitting small patches, fixes, bits of functionality, have made the difference between making Open Source a hobby of a select few, and making it a (possibly) useful tool.

    Oh well. I hope the folks at Orbiten will improve the situation (I'm sure their mailboxes will suffer the slashdot effect), and make the relative merit of their measuring methodology more clear. It is gratifying to see that someone picked up the odious task of trying to quantify what Open Source has to offer.

    As a side note, I lost my previous (very well written, thanks for noticing!) reply to this message because of accidentally clicking on a banner ad on slashdot. Oh, for the irony!

  • Similarly, if you wanted to determine who the most prolific scientific researcher is in a field, would you gather data by simply grepping for names in the texts of papers?

    Hmmm, this reminds me of the infamous Quotation Index used in the scientific world. Back when I studied sociology, a professor of mine would spend five minutes each college blasting the practice. As it turned out, a number of his colleagues were quoting each other, thereby bumping each others ratings. "On the effects of offering free ballpoints to interviewees", being referenced by an article on "A critical review of free ballpoints", referenced by the rebuttal, ad nauseam.

    Doesn't it strike a familiar note in a forum driven by mechanically established karma?

  • Geez, Microsoft doesn't have the time or the will to keep their products backwards compatable with CP/M 86, errr DOS 1.0. They're keeping with the times, they only carry the burden of backwards compatability to DOS 3.3. I mean MS has got better things to do with it's time than make sure DOS 1.0 stuff works, they're busy adding the much needed Auto-remove-all-vowels-from-words-directally-follo wing-a-spelling-error feature in Word 201.
  • by foo22 ( 154205 )
    The ironic thing is that I believe that those graphs were made in Excel. I doubt you could get less open source or free than Excel.
  • I don't see how that is being bigoted or not rational, I simply pointed out the irony in the fact that as a study on Free Software you would think that they would use free software. I'm not saying that I wouldn't have done it the same way but I feel (maybe just me) that they are sticking their collective feet in their collective mouths.
  • ...and highschool teachers :(

    --
  • Acording to the survey, they analyzed one thousand megabytes of programs...doesn't that seem a bit small? Meaning that all the open source software that they could find can only fit into an old style P-1 computer? Or is this just supposed to be a representative sample, and, if so, how did they choose this particular sample?

    THese projects would average about 300 K each...what are they? Drivers? Application programs? Pac-Man clones?

  • Is it just me or did anyone else notice the distinct MS Exel look about those graphs?

    Devilish

    --------Irc.destructor.net--------
    --------The Geek Network--------
  • Still Sun is a BIG company. Props to those who make the top 10 as individuals. -A
  • I wish I had time to sit around and contribute to a ton of open source projects... alas, I have to make a living.

    My question is, after viewing some of the profiles of the top contributors, is 0.139% really much of a contribution to a project?

  • Hmm, this gives me an interesting idea... for another Slashdot poll suggestion, of course :-)

    Why does Win2K have more lines of code than all the open source projects combined?

    1. Because open source projects are lean and mean, and pack a lot of punch; not spongy and flabby like M$ bloatware :-)
    2. Because open source programmers don't like their programs to have any features. Features are for M$ spoon-fed victims. (sarcasm)
    3. Because Win2K actually does something, unlike open source software which merely rides on hype (I mean, it takes a lot of effort to cause Linux kernel panic whereas under Win2K it's so easy that sometimes it's even spontaneous -- obviously M$ understands, unlike OSS fanatics, the need for an easy way to crash!)
    4. Because Open Source is just hype, and cannot produce anything close to a real system.
    5. Face it, people, M$ knows what it's doing and ain't a bunch of loud-mouthed teenagers shouting Long Live Open Source without knowing how the real world works.
    6. Because ... how else would there be enough room for all those 64000+ bugs to hide?!
    7. Because that's how M$ programmers avoid getting laid off: Pad every source file with lots of newlines and useless comments (not to mention the occasional bug) so that their employee record shows a high count of number-of-lines-of-code they wrote.
    8. Because Win2K is written in a verbose language known as VB.
  • Based on the fact that my current Free Software projects contributed over the last 10 years comes to 7 projects with 10,586,400 bytes of code puts me at number 6 on the overall list, and I didn't even get a mention anywhere, leads me to conclude that these statistics are nowhere near accurate.
  • I'd like to see what they would get if they did it on LinuxOne

  • hit the submit button instead of preview... but an impressive list anyhow
  • This would be a nice feature
  • perhaps we should have put a disclaimer on the survey. on the other hand, given that almost nobody here seems to have read the text accompanying the survey, it may not be worth it.

    the text clearly lists the limitations of the survey including the small code base used; the algorithm to identify and credit authors is clearly documented - and the source code is available on the site FWIW. of course, the survey is full of errors, some of which i've commented on here, on advogato [advogato.org] and elsewhere (e.g. gordon matzigkeit).

    the main problem is naturally that this is impossible to do by hand and has to be automated; we did want to look at authorship at a file level (the lowest level of granularity available); and author credits are in no fixed format. they're not even there much of the time, which is why copyright holders such as the FSF get a lot of credit too. the only alternative to listing them as they are is to have a huge "uncredited" portion - at least until authors start consistently claiming credit, using the same name or e-mail address in each file they write.

    incidentally it is not possible for us to guess which of many contributors to a single file are more important; as documented, the credit is currently split equally among them.

    finally, this is just a start. while we intend to continue working on this, the algorithm source code is available as are all the code bases, so nothing stops you from doing it too.

  • Since M$ isn't capable of intellegant programming, and M$ is (not) innovative at all, they must of contributed something -- oh, yeah, "Fatware"! How would I miss that?

    Of course for those enthusists out there willing to develop on four platforms (Win 9x, Win NT, Win 2000, and Win CE, where Win 9x keeps DoS 1.0 compatiblity), take it from the former richest man in the world -- code "Fatware" and aquire a mono^H^H^H^H er, sucessful company.

    Win 2000 will work on any computer! Any computer that is fast as AMD's Athlon can be overclocked! That is the sucess of "Fatware" -- make deals with OEMs, and when the hardware companies get more revenue because of your "Fatware", you get a cut! How can you go wrong? Unless you have a DOJ lawsuit hanging over your head, you can't!

    Of course coding excessive lines can be rather time-consuming. The fun part is having more backdoors in your software than the White House. After making a bunch of companies squirm over the backdoors, write a simple patch and charge a fortune. ;) But it's not a "bug fix" as those open-source people would say... nah, it's a "Service Pack".

    Ah, maybe M$ should have compatibilty with DoS 1.0 -- the later versions didn't improve the OS much. :)

  • Since the study seems fuzzy on exactly what was counted and seemed to imply that documentation was looked at as well, I have to wonder how much of the FSF/Sun/UCB contribution consisted merely of copyright (or copyleft) text rather than actually unique code.
  • A long long time ago I can still remember
    how those trollers used to make me smile
    And I knew that if I had my chance
    I could make Slashdot:ers dance
    And maybe they'd be happy for a while.

    Did you write the stuff that matters
    And do you have faith in CmdrTaco
    If the /. tells you so
    Now do you believe in rock 'n troll
    And can moderation save their mortal soul
    And can you teach me how to troll real low

    Well I know that you're in love with her
    'Cause I saw you posting in the forum
    Pouring hot grits down her pants
    Man, I dig those flamebait rants

    I was a lonely teenage broncin' h4x3r
    With a pink iMac and a beowulf cluster
    But I knew that I was out of my mind
    The day The Slashdot died

    I started singin'

    Bye-bye, Miss Petrified
    Surfed my IE to the forum but the forum was dry.
    And good old trollers (were) drinkin' whiskey and rye
    Singin' this'll be the day that I die
    This'll be the day that I die

    I met a geekgirl who sang the blues
    And I asked her for some nerdy news
    She stole my coke and turned away
    Well I surfed down to the sacred forum
    Where I'd saw the posts years before
    But the 404 said the posts woudn't come

    Well now on the street the trollers screamed
    The l33t3rs cried, and the h4x0rs dreamed
    But not a word was posted
    The Taco Bells all were broken (OT?)
    And the free man I admire the most
    OOG with the Holy Post
    He sent the last post to the host
    The day The Slashdot died

    We started singin'

    ||: Bye-bye, Miss Petrified
    Surfed my browser to the forum but the forum was dry.
    And good old trollers (were) drinkin' whiskey and rye
    Singin' this'll be the day that I die
    This'll be the day that I die :||
    (repeat plz)

    (TTL 4) We started Pingin'
    ____________________________________________
    By: TACO TROLL of the Troll Liberation Lobby

It seems that more and more mathematicians are using a new, high level language named "research student".

Working...