Follow Slashdot stories on Twitter


Forgot your password?
Programming IT Technology

Talk To a Successful Free Software Project Leader 150

Nagios (formerly known as NetSaint) is a GPL network monitor software project that's been getting a lot of buzz lately among *Nix sysadmins. Nagios is unquestionably a free software success story even if it's not as high profile as Apache or Linux. Ethan Galstad leads the project. Perhaps he can tell us why Nagios has done so well, so that other free software projects can enjoy similar success. Usual Slashdot interview rules; post your question below, we'll email 10 of the highest-moderated questions to Ethan about 24 hours after this post appears, and publish his answers soon after he gets them back to us.
This discussion has been archived. No new comments can be posted.

Talk To a Successful Free Software Project Leader

Comments Filter:
  • I'd like to know (Score:2, Interesting)

    by CrazyDwarf ( 529428 )
    What would you say the biggest challenge you have faced is, and how did you handle it?
  • by SHEENmaster ( 581283 ) <travis@utk . e du> on Monday December 23, 2002 @01:09PM (#4945111) Homepage Journal
    Nagios® is a host and service monitor designed to inform you of network problems before your clients, end-users or managers do. It has been designed to run under the Linux operating system, but works fine under most *NIX variants as well. The monitoring daemon runs intermittent checks on hosts and services you specify using external "plugins" which return status information to Nagios. When problems are encountered, the daemon can send notifications out to administrative contacts in a variety of different ways (email, instant message, SMS, etc.). Current status information, historical logs, and reports can all be accessed via a web browser.

    Does that mean it can predict when a Windows system tries to use my network before the enduser gets a bluescreen? Woah; that's impressive.
    • Current status information, historical logs, and reports can all be accessed via a web browser.

      That's great for interactive use, but Nagios (along with Big Brother, and most other monitoring packages) doesn't seem to cater well to automating report generation from outside of a web browser. We need to generate weekly reports on the number of outages, etc., and would like to be able to schedule a cron job every Sunday night to say "get me the uptime stats for abc services, so I can put them into xyz reporting package". We need to take the raw data and calculate rolling averages, etc, to give to customers (we're contractully obliged to do so). I.e., the sort of reports we need are typically more complex than is reasonable to expect Nagios to do internally. Was the interactive bias a deliberate decision, or did it just evolve that way. More importantly, are there any plans to improve things in this area?

      • at my summer job.

        There are several free services that do that. As for writing a report, just modify one of the cgi scripts to include your company name and junk and add a wget command to the cron script.

        use it like this:
        %wget -O report.html
        => `report.html'
        Resolving flame... done.
        Connecting to flame[]:3128... connected.
        Proxy request sent, awaiting response... 200 OK
        Length: unspecified [text/html] 45.34M/s

        12:36:22 (45.34 MB/s) - `report.html' saved [47540]

        I have a proxy server, and downloaded the startpage for my site, but the usage will be similar for your script. I also had to remove 'junk characters'; damn you lameness filter! Be sure to stream output to null so your daemon doesn't email you weekly.

        I might be writing some php scripts to monitor uptime; email me if you would like a copy when they are complete.
      • Nagios...doesn't seem to cater well to automating report generation from outside of a web browser.

        Out of the box (binary distrobutions), you're right, it doesn't. However, Nagios has an extension to store its logging information in a relational database (MySQL or PostgreSQL). It requires you to run configure and build from the sources. However, once done, this should make it a heckuva lot easier to generate reports using Perl DBI or PHP or something to extract the data from the rows. Here's the skinny on how to do this [] (from the "Advanced Topics" section of the Nagios Documentation []).
  • In your opinion.. (Score:2, Interesting)

    by WPIDalamar ( 122110 )
    what's the WORST security practice/vunerability/annoyance that's come out in the pasy year?
    • heh heh "Use the Preview Button!"

      That should have been...

      "In your opinion what's the WORST security practice/vunerability/annoyance that's come out in the past year?

      • You know that Netsaint/Nagios and SAINT/SATAN are not the same thing, don't you? Apparantly not. Netsaint/Nagios is a host monitoring system, SAINT/SATAN is a vulnerability scanner.
    • Stop moderating this up... as another poster pointed out, NetSaint is not the intrusion detection tool I thought it was, so this is a pretty dumb question.
  • How does your product compare with similar commercial solutions?
    • by Thinko ( 615319 ) on Monday December 23, 2002 @01:27PM (#4945261) Homepage
      In Specific, How does Nagios compare to recent commercial offerings like Microsoft's MOM and Novell's ManageWise / ZenWorks, Will Nagios have the Depth of Intelligence when it comes to Reporting, and tracking similar (or related) events as a single more-critical super-event?

      Other items of note for comparison are issues like XML Output, I see that XML status data is planned for Version 3, what depth of information will be able to be queried/reported with XML?
      • A couple more thoughts I had, were:

        What future features for products like MOM will be implemented in Nagios, do you see any specific roles currently covered by SMS/MOM/OpenView/etc. that will eventually be done in Nagios?
  • by mrblah ( 229865 ) on Monday December 23, 2002 @01:19PM (#4945194)
    It seems that most open source projects rely heavily on word-of-mouth and perhaps a few announcement sites, like Freshmeat, that have geek-appeal. But with open source trying to break into the mainstream, what do you think open source projects should do to effectively market themselves to non-geeks?
  • why the name change? (Score:4, Interesting)

    by sgtron ( 35704 ) on Monday December 23, 2002 @01:19PM (#4945200)
    NetSaint was such a cool name.. why change it to Nagios.. just doesn't have the same ring.
    • I'm assuming it has something to do with this:

      NetSaint is not affiliated with World Wide Digital Security, Inc. (WWDSI); Richard S. Carson and Associates, Inc; and the marks WEB SAINT, SAINT, SAINTWRITER, SAINTEXPRESS, and SAINTBASIC owned by Richard S. Carson and Associates, Inc.

      Looks like SAINT is a little too close to some security-related trademarks, that probably threatened the group when they saw the name.
    • Ya know I was thinking the same thing. Especially for a program that does what it does in helping site integrity, NetSaint seems like the perfect name.
  • Direction (Score:5, Interesting)

    by FreeLinux ( 555387 ) on Monday December 23, 2002 @01:23PM (#4945236)
    Nagios is an outstanding project, not only in terms of its success but, also in terms of its power and broad scope. Looking at Nagios today it is increasingly apparent that its functionality is starting to approach that of HP OpenView and CA Unicenter TNG.

    My twofold question is, what has determined Nagios direction thus far? Was it modeled after OpenView and TNG or something else? Also, where is Nagios going in the future, will it continue to develop the features of OpenView and TNG or is it going somewhere else?
  • by gelfling ( 6534 ) on Monday December 23, 2002 @01:25PM (#4945245) Homepage Journal
    That is how do you know your're doing the right thing and how do you know you're doing it the right way to the right conclusion?
  • by Anonymous Coward on Monday December 23, 2002 @01:26PM (#4945253)
    Your monitor appears to use a model where it
    polls a pre-defined list of conditions. In other
    words, if there are 28 things that could go
    wrong, there are 28 pre-defined items that
    change color from green to yellow, to red.

    In my experience, an event based model, where
    monitors determine the problem and severity,
    works better. The central event manager would
    just receive the events and handle display and

    Can your product handle this sort of model ?
    For example, could I write a monitor that watched
    a database log file, and have it send events
    like this ?

    severity category host message
    high database myhost database memory shortage
    medium os myhost fs /db1 is over 90% full

    • AC,
      This was clearly a design decision and if you prefer this style of monitoring, then I'd suggest Big Brother. For my environment, Nagios made the correct choice. If you are monitoring many applications (many > 100), then with a model that pushes events to the monitoring system, you will (probably) end up with a distributed configuration nightmare.

      That said, I think you could probably hack a Nagios setup to do what you want with its distributed monitoring features. I.e., you could write your custom monitoring app to implement the interface that Nagios uses for satellite monitoring instances and then configure Nagios to use your custom monitoring app as a satellite. But I have not tried/done this, so I could be wrong, wrong, wrong.

      • As far as network management is concerned, SNMP was designed with the philosophy that the management app would poll for status, since that scales, but would also support events, since that provides a more timely response. UDP was chosen as the transport protocol, so that events could (and usually would) be transparently dropped when there were network problems. "The Simple Book" [] provides more details; suffice it to say that I agree with the arguments made therein.

        The arguments are weaker if you are monitoring things above the network layer, but I think that they still hold a lot of water.

        Nagios apparently uses the polling model, which is good, but seems to use TCP, which is bad. It also seems to have support for so-called mid-level managers (MLMs), that watch subsections of a network and aggregate the results for higher levels. This is a good thing. In order to scale, MLMs should not report a lot of detail unless directly queried. I don't know how well Nagios supports the MLM model. Can anyone tell me more?

  • by feldsteins ( 313201 ) <> on Monday December 23, 2002 @01:27PM (#4945265) Homepage
    How can the sucess of geeky sysadmin software be translated into open source projects aimed at a wider audience? Put simply, can the open source model work beyond nerdy sysadmin widgets and spill into the world of mass-appeal software?
  • Was there a make or break moment when it could have all ended? If so what pulled the project back on track?
  • my question (Score:5, Interesting)

    by greechneb ( 574646 ) on Monday December 23, 2002 @01:30PM (#4945292) Journal
    I'm sure people often send you feedback about your software. What I would like to know is if you have any feedback that stands out. Mainly what is the most unusual/unique use someone has had for netsaint that you have heard of?
  • Free Software (Score:5, Interesting)

    by Natchswing ( 588534 ) on Monday December 23, 2002 @01:33PM (#4945309)
    Since your software is so successful, have you thought about charging money for it?
  • I've been using Netsaint for a couple of years now and its a really nice monitor package - pretty easy learning curve with the well commented config files, easy to extend if you want to write a little perl or C, and best of all it understands hierarchy - if you lose a major link in your network instead of complaining about all of the hosts on the other side of the outage, it just reports the link failure and warns that the other nodes are unreachable.

    I have to agree with the others that have posted - why drop a perfectly good (and recognized) name like Netsaint for something we can't even pronounce?

  • propriety... (Score:5, Interesting)

    by bhsx ( 458600 ) on Monday December 23, 2002 @01:41PM (#4945380)
    If a company came along and asked to market a version of Nagios that includes unpublished changes to the codebase, what would your response be? For example, would you:
    A. give them a relicensed version that allows them to do whatever they want to it.
    B. incorporate any changes they may want on your own and make sure the changes make their way to the GPL codebase.
    C. tell them to get bent.
    D. make proprietary changes that you leave out of the GPL codebase in order to sell those changes yourself or to other potential clients
    E. Some combination of the above.
    F. Some other direction I didn't think of

    I feel that making proprietary changes to GPL code that you keep (at least temporarily) proprietary is a great business model for certain projects, possibly the best model for certain things. Some projects that come to mind are things like's Secure iXplorer, which has a GPL "lite" version which only supports ssh/scp and a "full" version that also supports sftp. and Star Office seem to be of the same ilk... If you need the extra functionallity of Star Office, such as the better .doc filters and database functions, then you pay for that.
    I'm also curious if you have been approached by anyone for this sort of thing.
  • How did it start? (Score:5, Interesting)

    by SupahVee ( 146778 ) <> on Monday December 23, 2002 @01:42PM (#4945383) Journal
    Did Netsaint/Nagios start small, i.e. just a small shell script that was doing some minimal network testing, or was it designed from the ground up as a massive network tester to replace such overpriced products as NP OpenView, etc?

    I know there was a serious code revision between Netsaint 0.0.7 and Nagios 1.0, which was phenomenal, btw, great job. But after using Netsaint (I still call it that, old habits die hard) for almost 2 full years now, I've always been very impressed with how well everything runs and scales.
  • by sys$manager ( 25156 ) on Monday December 23, 2002 @01:43PM (#4945388)
    I an running Nagios and having a major problem with one of the plugins that is severe enough to make me throw out the software if I can't get it working.

    I've asked on the two nagios mailing lists and received no answer. How do I, working for a major corporation, promote this software package if there's nobody that can help me fix it? Where do I look for support for a free product?
    • Figure out what it is worth to you to have software that does what Nagios does and how much you're willing to spend on using it. Now you have a budget to spend on Nagios consultants to train your IT folks.
    • You're looking in the right places, but the fact is that the support doesn't yet exist. This doesn't happen with software you pay for unless it comes from lame companies. The nature of OSS makes it possible for non-programmers to use the software before it's ready, and there are problems associated with that. Since you don't have the time to contribute, you may want to stick with OSS projects with versions over 1.0 since that is typically what signifies a project's readiness for public consumption.
    • Ah, you pose a good question and as always the responses are typically emotionally motivated. Although there is no guarantee that the following solution will work, it is certainly the mostly likely option for success.

      You state that you are threatened with dumping Nagios because of the issue you have with the plugin. Assuming that your organization requires a network monitoring system, it seems only logical that you would have to replace Nagios with a commercial system, a system that will likely cost a great deal of money.

      Could you not get some funds allocated to allow you to contact the writer of the plugin directly and hire them as a consultant in order to fix the bug or implement a feature that you need. I suspect that for a couple of thousand dollars you could have the actual writer of the plugin address your needs directly. Surely this would be far cheaper than the likely hundreds of thousands of dollars that would be necessary to completely replace Nagios with a commercial system. Further, releasing your fix/enhancement to the open source community would advance the entire project that much more.
    • What's the issue? If it's the problem with SMTP servers saying they're down when they're not, you need to edit the check_smtp.c file and remove the check for CRLF in the plugins source as described here [].

      Funny, someone answered me quickly when I asked about it. If you didn't give any more details than the post I'm replying to, I can see why you didn't get an answer.

  • Prioritization (Score:5, Interesting)

    by 10-20-JT ( 628170 ) on Monday December 23, 2002 @01:45PM (#4945410)
    I assume there is a long list of "features" which your users and program staff have come up with for desired future components. How do you prioritize those in the development queue? Is there any method at all? Squeaky wheel? Most requests? Interest of particular developers? Donations with particular requests?
  • by FreeLinux ( 555387 ) on Monday December 23, 2002 @01:58PM (#4945505)
    Nagios' present event handling performs a prescribed action based on a state change in a monitored service, this is an excellent feature that pushes Nagios beyond a simple monitoring application into a true management application. In CA Unicenter, event handling goes a step further, allowing you to configure any action based on ANY message that appears in the event log. This in my opinion, is one of Unicenter's strongest features, though there are many.

    Will Nagios be implementing similar event handling functionality or will using utilities such as Swatch remain necessary? And if Nagios will not gain this flexibility, why would you feel that this functionality is unnecessary?
  • Funding (Score:3, Interesting)

    by Alethes ( 533985 ) on Monday December 23, 2002 @02:00PM (#4945525)
    We often see jokes posted on here such as:

    1) License product under GPL
    2) ???
    3) Profit!

    What is #2 for you, or more generally, how do you support your project financially? What do you see as the most sustainable model for supporting Free Software?
  • This isn't really a question for the author so please don't mod this up.

    Does this software scale to monitoring thousands of servers? The only other reasonably mature open monitoring solution I investigated is mon, and it wasn't close to scaling to an environment of any size.
    • I wrote a network monitor (the daemon part anyway) that could do this, 100's per second. It did http(s) and ftp. You have to make it multi-threaded, included the host look ups. The big problem doing that many is that if you get a network outage you will get a lot of events generated which will kill most event managers, so you need a good front end event filter which it sounds like Nagios has.

      Monitoring at that high a rate is also good if you have a SLA that's pretty tight.

      Another good thing to have is good built-in forensic diagnostics so you don't get paged by operations at 3 am to explain that spurious down event.

    • I've seen someone report on the sage mailing list that they were using nagios to monitor 500 hosts, with a total of around 1800 services.
    • nagios is capable of monitoring thousands of servers yes, but it has had some issues in the past with these very large networks.

      people on the nagios mailing list are doing it though, it just takes tuning.

  • Why Nagios? (Score:1, Interesting)

    by Anonymous Coward
    There are many Open Source alternatives around. Big Brother [], MRTG [], Zabbix [] comes to mind.
    What makes Nagios unique? Thanks.
    • not documented well on the site, last I checked, but the reason is:

      It reacts to things professionally:

      It keeps track of downtimes. It lets you SCHEDULE downtime (for specific time windows). It has access controls by user. It has limited views by user. It has notification windows per user.

      STuff like that. BigBrother doesnt come close. and MRTG has a completely different design goal, as far as I understand it.

      nagios is designed to be a cheap man's replacement for full on HP OpenView, in a true 24x7 NOC.

  • by Brendan Byrd ( 105387 ) <SineSwiper-slash ...> on Monday December 23, 2002 @02:23PM (#4945702) Homepage Journal
    One of the biggest problems with GNU projects is getting other people to help you out with your code. The code may be freely available, but that doesn't that people will freely code your project. At what point does a GNU project turn from one person coding his/her work, to several/many people working regularly on the project?
  • When developing a non-trivial software it's hard to resist the temptation to add a lot of features that will take time to implement and are not necessary central to the software you're building. Given that you do not have the budget limitations of a commercial software ("we won't do this because not enough customers ask for it so it's not worth doing"), how do you decide on which features to include? Is it to do with the popularity of the feature request? Or with the time it takes to implement? Or how central it is? Or something else?

  • It's clear that there are different driving forces behind Open Source projects and paid commercial projects.

    Open Source projects are driven by people who enjoy coding in their spare time, people who want to contribute something to the community or by people who have a need for a particular piece of software of functionality.

    Commercial projects are driven by the need to produce a product on-time and under-budget in order to sell it to make profit.

    In your expierience, how similar is managing an Open Source project to a commerical one? What sort of challenges would you face in an Open Source project that you wouldn't come across in a commercial one? Where do the skill sets required for each differ?
  • by Sj0 ( 472011 ) on Monday December 23, 2002 @02:39PM (#4945818) Journal
    What are your thoughts on arm-chair project leads? How do you deal with maintaining the hierarchy when such a person starts challenging your decisions?
  • by CountJoe ( 466631 ) on Monday December 23, 2002 @02:43PM (#4945851) Homepage
    I am a project manager for several open source projects and have had a great deal of trouble finding developers that will actually help with development. How do you find reliable developers that make a real contribution to your project?
  • My intranet hosts a number of web applications for internal use. Netsaint is one of those, and it has been a fantastic asset for us.

    Other handy web apps we love include Mantis (bug tracker), CVSWeb and Chora, phpMyAdmin, phpPgAdmin, SquirrelMail and so on. There are lots of great web apps out there these days that can provide web based access to some cool functionality.

    One major hassle, though, is that every one of them handles authentication and authorization differently. Setting up one login, or hacking them together into some sort of common framework is a giant hassle. Do you have any thoughts on how to get web applications to work well together?

    - H
    • My personal take on this:

      Standards are Good.

      HTTP auth is a standard. Nagios uses it. This is Good.

      I recently merged three web applications we have, one of them being Nagios, to use a single htpasswd file, and control access to the different areas by htgroup.

      Bug all free web software writers to support HTTP auth as an option, at minimum.
      • Yes, I think this is a reasonable start. You can suppliment it with mod_auth_mysql or mod_auth_pgsql to help tie things together as well. Still, it tends to be a lot of work.

        Also, if you figure out the authentication and authorization, what about making web apps fit into the rest of the site? Not simply the "look" of things, but the navigational scheme, the general arangement of elements on your site that make things consistant and navigable.

        It's an inordanent amount of work. Every try to fit someone else's forum app into your site? Oy veh. Faster to write your own.

        Web apps are easy, cross-platform on the server (to an extent) and wonderfully cross platform on the client side. They have so much potential. But I think interoperability is the major failing right now.

        - H

  • by jenkin sear ( 28765 ) on Monday December 23, 2002 @03:15PM (#4946092) Homepage Journal
    Nagios depends on a wide variety of plugins to do its job (in a way, like nessus). To what degree do you find outside developers contributing patches to the main codebase, vs. contributing plugins? Is there a path where developers add plugins, and then "graduate" to core patches? I think I see a similar path in both Linux and Apache, where one might write modules and then get involved in some of the deeper magic- and I wonder if that architectural decision may be a key to the project's long-term success.
  • People issues? (Score:5, Interesting)

    by dmuth ( 14143 ) <.moc.liamg. .ta. .todhsals+htum.guod.> on Monday December 23, 2002 @03:31PM (#4946213) Homepage Journal
    Have you ever had to deal with any developers who um, had issues? For example, someone who refused to comment their code, or someone who would volunteer to implement a feature and then "not get around to it" which forced the project as a whole to suffer?

    If so, how did you deal with those people? Did you ever find yourself forced to burn any bridges as a result of dealing with such people?
  • One of the shortcomings that I always knock up against with netsaint is that I can pretty much monitor anything I want with it. I certain situations though the problem becomes a matter of having many items to drop into a single container. Assume I have a bandwidth threshold check, or a trap container for a device. It is possible for any number of items to be triggered on one device such that we could say a theoretical serial3/1 interface triggers a bandwidth threshold as well as a hssi1/0 on the same router. Sending those results into netsaint results in the last one in wins. Or take a similar example with a device sending traps, the device could send any number of traps corresponding to different interfaces, etc.. Is there any way of incorporating into netsaint/nagios a dynamically sized container such that the last one in doesn't necessarily eliminate the previous result. Is there any way that there could be a framework incorporated into future releases such that perhaps a new argument gets passed to send_nsca which could be an inteface id and if there is an existing interface id matching then it clears or remains the same, or if there isn't a matching id it would resize and add the new one on top of the stack.

    I can dream. One thing I must say is that netsaint is a wonderful wonderful piece of software!

    Thanks so much.
  • I know of at least one large ISP who have bought a commercial, closed source monitoring system. However, this works so badly that the sysadmins have installed Nagios to run alongside it (presumably without official permission). Do you know of any other instances like this, and how do you think it impacts on Nagios usage and development? For instance, is it hard to get people to publish bug fixes and new features if their employers have commited the company's resources to a competing product?
    • What the admins need to do is let the system fail, fix it, then include a "Nagios, a *free*, *no cost*, software product began detecting/predecting/correcting/whatever this in version foo, which was on released on bar" in theeir status reports.

  • I used to be quite the open-source advocate, until I started paying more attention to many of the successful open-source projects out there like say Gnome and KDE.

    Let me focus the rest of my response on GUI development...

    The problem I see with many projects like these is that they fail to innovate as much as they copy. If this world was 100% open source, we'd probably see more GUI fragmentation than we could stand. Going from one platform to another would be a very irritating process (more than it already is anyway).

    So honestly, without companies like Apple and Microsoft spending millions a year on user interface research, we wouldn't have seen the tremendious WIMP evolution that we have over the past ten years.

    In short, without closed source companies spending their own time and money to advance their products, the open-source competition wouldn't be near as advanced.

  • Hi,
    We have a network of over fifty servers all monitored by Nagios and it has served us well.

    My question is this:

    Your software came with the option of the new "Object" model which you are switching to. When you have over fifty servers each with multiple services this creates a *huge* object file that sysadmins have to create.

    I wrote a PHP application just to manage all of these issues and generate the object files from a database I created. The main nagios server connects to the central DB-server PHP page and wgets a fresh object file for it's child servers once each hour or so to facilitate changes. But I digress, my question is "How come no tools like this were released 'with' nagios?" And, would you be interested in my publishing the source for these programs or are you going to change this object file format at a later date?

    This is a project I worked on a while ago and honestly have not looked recently but I remember sitting and scratching my head for a while wondering why this had not been implemented with the release.

    Thanks for you time.

    -Joel De Gan

  • I don't have any - and I didn't take any time to look into who you are before I asked this question. I currently have one project I'm working on which I have released under the GPL and I have several projects more which I intend to do the same thing with (but im holding off till they are a little more finished before I do) - but my projects don't pay the rent, and so althou im looking to use my own code to profit in services in the future, my workload puts me in a situation where i just dont have time to push my "brainchildren".

    Mabye it is that your living arangements were fertile soil for NetSaint, or perhaps you were in a position to put all of your-out-of work hours into it? Did an early embrace from the community help give it momentum?

    I'm sorry - i dont even know if your the original author or inherited it.

    Ah well - back to work
  • I used to be quite the closed-source advocate, until I started paying more attention to many of the successful open-source projects out there like say Gnome and KDE.

    Let me focus the rest of my response on GUI development...

    The problem I see with many closed-source projects like these is that they fail to innovate as much as they copy. If this world was 100% open source, we'd probably see more code re-use. Going from one platform to another would be a very easy process (more than it already is).

    So honestly, without companies like Apple and Microsoft stealing innovations from open-source authors, we wouldn't have seen the tremendious WIMP evolution that we have over the past ten years.

    In short, without open-source projects innovating to advance their products, the closed-source competition wouldn't be near as advanced.

"The following is not for the weak of heart or Fundamentalists." -- Dave Barry