Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Answers From a Successful Free Software Project Leader 170

It's time to crank up the Slashdot Interviews for 2003, starting with answers to your questions for Nagios developer Ethan Galstad. He went far beyond and above the call of duty here to give you what amounts to a veritable "Free Software Project Leader's FAQ" that anyone who has ever thought about starting his or her own project ought to read. Thanks, Ethan!
1. Marketing & publicity

By mrblah

It seems that most open source projects rely heavily on word-of-mouth and perhaps a few announcement sites, like Freshmeat, that have geek-appeal. But with open source trying to break into the mainstream, what do you think open source projects should do to effectively market themselves to non-geeks?

That's a good question. However, I can't say much about this, since I really haven't had to deal with it. Nagios is targeted towards sysadmins, so they hear about it by word-of-mouth, Freshmeat, Google, etc. Most OSS project operate with no ($$$) budget, so traditional marketing methods are probably out of the question. Having a project included in a popular OS/application distro would be ideal, although this would require that the project first become popular "enough" (whatever that means to the distro producers) through word-of-mouth, web search ranking, etc.

2. Direction

By FreeLinux

Nagios is an outstanding project, not only in terms of its success but, also in terms of its power and broad scope. Looking at Nagios today it is increasingly apparent that its functionality is starting to approach that of HP OpenView and CA Unicenter TNG.

My twofold question is, what has determined Nagios direction thus far? Was it modeled after OpenView and TNG or something else? Also, where is Nagios going in the future, will it continue to develop the features of OpenView and TNG or is it going somewhere else?

The basic features of Nagios were modeled after things found in other similar projects (mon, Angel, spong, etc.), with my own twists to satisfy what I thought was missing from those projects. Many of the features that have been added over the past few years have come about because of suggestions/complaints from users. Other features (like flap detection) have been thrown in "stay ahead of the competition", as well as provide something useful.

I've only had a cursory look at TNG and OpenView, but I think its safe to say that they will always do more than Nagios. That's okay though - they'll cost you a bit more than Nagios will too. I have no intention of trying to make Nagios a "one app for everything" type of project. The focus of Nagios is on monitoring and alerting and it always will be. And while many might assume that everything that can be done in regards to monitoring has already been done, I don't think that's the case (at least in free software). The lack of good failure prediction (using AI) in regards to asynchronous events like host/services failures is a huge feature that's missing from most (if not all) free network monitoring software. That's one of the things that I'll be attempting to tackle and integrate with Nagios down the road. Other things like expanded reporting capabilities, increased scalability and efficiency are top priorities as well.

I guess the direction Nagios is going is towards being an "enterprise" application, however you might define that. When I first started Nagios/NetSaint, I assumed it would only be used on small LANs. Over the years its been adopted by ISPs (local and global) and Fortune 500 companies with substantial networks. Making Nagios work well in these larger environments has been the big challenge, but that's were the development is leading me.

3. Predefined alerts vs. dynamic events

By an Anonymous Coward

Your monitor appears to use a model where it polls a pre-defined list of conditions. In other words, if there are 28 things that could go wrong, there are 28 pre-defined items that change color from green to yellow, to red.

In my experience, an event based model, where monitors determine the problem and severity, works better. The central event manager would just receive the events and handle display and notification.

Can your product handle this sort of model? For example, could I write a monitor that watched a database log file, and have it send events like this?

severity category host message
high database myhost database memory shortage
medium os myhost fs /db1 is over 90% full

What someone determines as "better" is up for debate, but yes, Nagios supports both active and passive checks. Active checks are performed by the monitoring process and allow the admin to centralize check configuration/execution, while passive checks are submitted by third-party scripts and allow flexibility in integrating Nagios with custom/proprietary sources of monitoring information.

Implementing a monitoring app that relies solely on event-based data passed from external sources is an extremely poor design choice, IMHO. What happens when the remote host or process (whatever reports those events) dies a horrible death? Nothing - unless you have some logic in the central monitoring process (which Nagios does) that accounts for these types of problems. There are also issues of whether or not the event data sent from a remote source is credible or not. That is to say, can it be trusted? This is a security issue, as well as one of data integrity. These issues have hopefully been addressed in Nagios by using several different mechanisms: requiring that services (for which event data is submitted) are configured on the central monitoring server, security restrictions on the external command interface (restricting which local users/processes can submit data to Nagios), and the NSCA addon which uses encryption to ensure that data received from remote hosts can be trusted (i.e. it is from a "blessed" user/process).

4. Mass-appeal software

By feldsteins

How can the sucess of geeky sysadmin software be translated into open source projects aimed at a wider audience? Put simply, can the open source model work beyond nerdy sysadmin widgets and spill into the world of mass-appeal software?

This is similar in nature to question 1. Basically, my answer is "I don't know - I haven't had to deal with that". :-)

5. Feedback

By greechneb

I'm sure people often send you feedback about your software. What I would like to know is if you have any feedback that stands out. Mainly what is the most unusual/unique use someone has had for netsaint that you have heard of?

I guess the most common feedback I get would be "Nagios rocks", which makes me feel all warm and fuzzy inside. There are a lot of different people/organizations that use Nagios for a variety of purposes. One of the most unusual was someone who used NetSaint (Nagios), joyd, and some hardware hacks to monitor on/off air time at a radio station as mentioned here. Another one that gave me a few laughs is someone who configured Nagios to generate audible alerts over the company's PA system when things went awry. Turns out things went bad and Nagios started "talking" over the PA when they were in the NOC alone one night, which scared the bejeezus out of them. Hehe.

6. Free software

By Natchswing

Since your software is so successful, have you thought about charging money for it?

Not for more than a few seconds when I imagined myself moving down to the Virgin Islands on a permanent vacation. :-) No, Nagios will always be free (as in beer and speech). I have considered developing additional (software) addons and tutorial-type material which would be for sale, but I keep putting these off to spend time on Nagios itself. Working on this project isn't really about money, or I would have stopped working on it a long time ago.

7. Propriety...

By bhsx

If a company came along and asked to market a version of Nagios that includes unpublished changes to the codebase, what would your response be? For example, would you:

  1. give them a relicensed version that allows them to do whatever they want to it.
  2. incorporate any changes they may want on your own and make sure the changes make their way to the GPL codebase.
  3. tell them to get bent.
  4. make proprietary changes that you leave out of the GPL codebase in order to sell those changes yourself or to other potential clients
  5. Some combination of the above.
  6. Some other direction I didn't think of

I feel that making proprietary changes to GPL code that you keep (at least temporarily) proprietary is a great business model for certain projects, possibly the best model for certain things. Some projects that come to mind are things like i-tree.org's Secure iXplorer, which has a GPL "lite" version which only supports ssh/scp and a "full" version that also supports sftp. OpenOffice.org and Star Office seem to be of the same ilk... If you need the extra functionallity of Star Office, such as the better .doc filters and database functions, then you pay for that. I'm also curious if you have been approached by anyone for this sort of thing.

I would be willing to do work for a company that wanted unpublished changes made to Nagios if the final product was marketed in a way that didn't violate the GPL. An ASP that might use the modified app to sell a service rather than the actual software itself would be a good example of this.

I have been contacted by three or four companies in the past two years that wanted me to do this type of work for them. I turned them all down. Why? Two big reasons:

  1. The wanted me to sign NDAs
  2. Potential conflicts of interest

I hate NDAs. I can understand why companies feel the need for them, but since I didn't need the work, I decided to pass. Also, the changes they wanted me to make were closely related to things that I wanted to include in future releases of NetSaint/Nagios. If I made custom mods for them under an NDA (or even without), I might be locked out of making similar changes to Nagios under the GPL (with work for hire copyright issues, etc.).

I guess I'd rather spend my time developing additional software/documentation to sell on my own rather than screw myself and this project in the long run by doing this type of work for a company (i.e. competitor).

8. How did it start?

By SupahVee

Did Netsaint/Nagios start small, i.e. just a small shell script that was doing some minimal network testing, or was it designed from the ground up as a massive network tester to replace such overpriced products as NP OpenView, etc?

I know there was a serious code revision between Netsaint 0.0.7 and Nagios 1.0, which was phenomenal, btw, great job. But after using Netsaint (I still call it that, old habits die hard) for almost 2 full years now, I've always been very impressed with how well everything runs and scales.

I actually started to work on Nagios because a friend and I had talked about starting a part-time business to provide monitoring services to local businesses. I didn't like what I saw in the other monitoring apps, so I decided to write my own. Nagios was originally intended to be used to monitor small LANs - 20 or 30 servers max. Ironic that I initially started an OSS project to start a business and now I'm so busy with it that I don't have any time to actually do it. :-) Perhaps in the future.

Nagios didn't start as a set of scripts or a cronjob - it was designed from the beginning as a standalone app that relied on external apps/scripts to do the specifics of monitoring. It has come a long way in the past 4 years, but the basic logic is still the same. Much of the work that has been done is the result of trying to make Nagios scale well to larger environments.

9. How is a project like this supported?

By sys$manager

I an running Nagios and having a major problem with one of the plugins that is severe enough to make me throw out the software if I can't get it working.

I've asked on the two nagios mailing lists and received no answer. How do I, working for a major corporation, promote this software package if there's nobody that can help me fix it? Where do I look for support for a free product?

Money talks. If there's no money, people might not talk. Reports like this are not uncommon, so I put together this list of companies and individuals who have indicated that they provide consulting services and support contracts for Nagios. Spend a few (or more) of the corporation's dollars and see if you can hire someone to help get Nagios up and running. Otherwise you are at the mercy of the generosity and availability of the people on the mailing lists.

10. Prioritization

By 10-20-JT

I assume there is a long list of "features" which your users and program staff have come up with for desired future components. How do you prioritize those in the development queue? Is there any method at all? Squeaky wheel? Most requests? Interest of particular developers? Donations with particular requests?

I've never received donations in return for a particular feature being added, but bribery wouldn't hurt I guess. Its hard to say how I prioritize feature requests. Sometimes its the squeaky wheel. Othertimes I'll get a suggestion from a lone user and I'll implement that feature because I see it has good potential.

Another factor is whether or not people contribute code for implementing the feature. Most people just make sugestions because they're not coders, and I'm left to implement that suggestion. That's usually fine by me, except when I'm short on time and either don't see it as being of great value or if I don't think I'll be able to implement that feature in a "reasonable" time frame. "Reasonable time frame" can range from 1 day to 1 year depending on how important I think that feature is.

11. Nagios event handling

By FreeLinux

Nagios' present event handling performs a prescribed action based on a state change in a monitored service, this is an excellent feature that pushes Nagios beyond a simple monitoring application into a true management application. In CA Unicenter, event handling goes a step further, allowing you to configure any action based on ANY message that appears in the event log. This in my opinion, is one of Unicenter's strongest features, though there are many.

Will Nagios be implementing similar event handling functionality or will using utilities such as Swatch remain necessary? And if Nagios will not gain this flexibility, why would you feel that this functionality is unnecessary?

Nagios is designed to handle monitored hosts and services in a very abstract way. It relies on individual plugins (check_disk, check_ping, etc.) or external apps (swatch, nmap, portsentry, etc.) to determine what is important as far as monitoring is concerned. If you remove that layer of abstraction, you can generally do more as far as monitoring is concerned, but you're also limited as to what custom data/services/devices you can monitor. As I see it, there is no real need to break that layer of abstraction.

As a site note (and more to the point of your question), event handlers can be designed to react not only to the state of a monitored service, but also on the output that was generated by the service check (i.e. the plugin). This would allow you to craft an event handler that reacted differently based on what message appeared in an event log.

12. People issues?

By dmuth

Have you ever had to deal with any developers who um, had issues? For example, someone who refused to comment their code, or someone who would volunteer to implement a feature and then "not get around to it" which forced the project as a whole to suffer?

If so, how did you deal with those people? Did you ever find yourself forced to burn any bridges as a result of dealing with such people?

As far as contributing to the core Nagios application, everything has to come through me. If someone doesn't contribute code for a feature, I will (if I have time and think its worthy). Since I rarely (if ever) apply a patch directly, I have a chance to look every line of code over before I integrate it with the main codebase. I tend to over-comment my code and have my own coding style, so I generally re-comment/reformat patches to fit my whim. Doing this also gives me a chance to make sure their patch doesn't have any unintended side effects. If someone submits a patch that I can't understand (or learn) and I don't hear back from them, the patch doesn't get applied. I think thats a reasonable approach considering the fact that they may not be around in 6 months and I'll have to maintain the code for who knows how long. All that being said, I really haven't had too many problems along this line.

13. What it is

By Tet

Current status information, historical logs, and reports can all be accessed via a web browser.

That's great for interactive use, but Nagios (along with Big Brother, and most other monitoring packages) doesn't seem to cater well to automating report generation from outside of a web browser. We need to generate weekly reports on the number of outages, etc., and would like to be able to schedule a cron job every Sunday night to say "get me the uptime stats for abc services, so I can put them into xyz reporting package". We need to take the raw data and calculate rolling averages, etc, to give to customers (we're contractully obliged to do so). I.e., the sort of reports we need are typically more complex than is reasonable to expect Nagios to do internally. Was the interactive bias a deliberate decision, or did it just evolve that way. More importantly, are there any plans to improve things in this area?

Nagios was initially designed for smaller environments where reporting might not be as big of an issue as it is elsewhere. Also, I wanted most all data to be available via a web browser, as that is a fairly ubiquitous access tool. Better reporting will be coming in the future, but I make no guarantees as to when. I haven't really had any reporting code contributed by users, so if you want better reporting soon, step up and contribute. That's how OSS projects work.

14. Versus other commercial apps

By Thinko

In Specific, How does Nagios compare to recent commercial offerings like Microsoft's MOM and Novell's ManageWise / ZenWorks, Will Nagios have the Depth of Intelligence when it comes to Reporting, and tracking similar (or related) events as a single more-critical super-event?

Other items of note for comparison are issues like XML Output, I see that XML status data is planned for Version 3, what depth of information will be able to be queried/reported with XML?

I haven't looked at MOM or ManageWise, so I can't say how they compare. Monitoring apps produced by OS vendors always have an edge when it comes to monitoring their particular OS(es), but they can't always be easily integrated into a heterogeneous environment. Tracking super-events basically involves event correlation. Since event correlation is a necessary part of decent failure prediction, I'll probably be adding this to Nagios in the future.

XML will be used for current status data and configuration information, as well as archived log data. Hopefully that will make it easy for other apps to process the data for reporting purposes, custom interfaces, etc. I doubt this data will be stored natively in XML by the application. Instead, scripts will be provided to convert the native data format into XML.

15. Why the name change?

By sgtron

NetSaint was such a cool name.. why change it to Nagios.. just doesn't have the same ring.

I changed the name to protect myself against future legal hassles. It seems that "NetSaint" was thought by some lawyers to be a potentially confusing term in relation to "Saint", which was trademarked. They way things shook out, I wouldn't have had to change the name, but I decided to anyway. I didn't want to wake up one morning and find that the netsaint.org domain was yanked from me in the name of trademark protection. This is also the reason why I filed for a trademark for Nagios® in the first place.

16. Raking in the coders...

By Brendan Byrd

One of the biggest problems with GNU projects is getting other people to help you out with your code. The code may be freely available, but that doesn't that people will freely code your project. At what point does a GNU project turn from one person coding his/her work, to several/many people working regularly on the project?

For me it happened a few months after the project got started. Feature requests were coming in faster than I could handle on my own. Luckily people stepped up and contributed code for the core app and plugins. I suppose this was due to the fact that Nagios is targeted at sysadmins - people who are probably more likely to be coders than your average Joe. Without help from others, there's no way Nagios would be where it is right now.

17. Finding developers that stick

By CountJoe

I am a project manager for several open source projects and have had a great deal of trouble finding developers that will actually help with development. How do you find reliable developers that make a real contribution to your project?

I got very lucky, plain and simple. A number of people have popped up to contribute code over the past four years, but most do not stick around for a lengthy period of time. Thankfully I have two developers who maintain the plugins, which allows me to concentrate on Nagios itself. Karl DeBisschop and Subhendu Ghosh are the main plugin developers/maintainers and have been critical to the advancement and survival of Nagios. Karl has actually been around since the project started, so he's been able to contribute a lot in terms of the main application, as well as the plugins.

18. Plug-in vs. monolithic work?

By jenkin sear

Nagios depends on a wide variety of plugins to do its job (in a way, like nessus). To what degree do you find outside developers contributing patches to the main codebase, vs. contributing plugins? Is there a path where developers add plugins, and then "graduate" to core patches? I think I see a similar path in both Linux and Apache, where one might write modules and then get involved in some of the deeper magic- and I wonder if that architectural decision may be a key to the project's long-term success.

There isn't necessarily any correlation between people who submit patches for the plugins versus the main codebase. Each patch is judged on its own merits, regardless of the contributor's previous involvement in either project. Patches for the plugins go through Karl and Subhendu, while patches to the main codebase go through me. Plugins generally get more patches than the main code - probably due to the fact that they're smaller and easier to understand for most people. From that standpoint, it would make sense that people start out making smaller/easier patches for the plugins rather than larger/more extensive patches for the main code.

19. Did the brown stuff ever hit the cooling thing?

By del_ctrl_alt

Was there a make or break moment when it could have all ended? If so what pulled the project back on track?

This question was modded fairly low, but I felt it was a good question to answer, so I did. Maybe my experiences will help others...

Yes, there were at least two times when I seriously considered dumping the whole project for good. One came when my personal life was going through some rough spots and the other came when the trademark mess popped up. Both times I ended up deciding to continue the project, but only after several months of "downtime". I felt that I had invested too much time in the project to simply let it die off. I enjoy working on Nagios and think I would have felt a sense of personal failure had I decided to quit when things got rough. There have been a number of other times when I've thought about ditching the project, but I've come to realize that they are just part of my natural development cycle and will pass with time. I've found that my normal cycle works something like this:

  1. Spend time mulling over and planning new features
  2. Code, debug, document
  3. Detest everything about this project and do nothing at all
  4. Rinse and repeat...

When I feel I've had enough and can't stand the project anymore, I just stop answering email, stop coding, and stop thinking about the project. This can last for a week or four months. When I've had enough time away from everything, I can get started again. This period of disgust is also a time when I start formulating ideas on what needs to be changed or added. I've come to accept and expect this period of downtime and, as a result, am now much happier with the project. Anyway, if you're thinking about starting an OSS project of your own, its something to think about.

This discussion has been archived. No new comments can be posted.

Answers From a Successful Free Software Project Leader

Comments Filter:
  • by Pike65 ( 454932 ) on Thursday January 09, 2003 @01:07PM (#5048152) Homepage
    "What is your favourite dip?"

    Oh, Nagios !

    Sorry, I was reading that wrong . . .
  • by Amsterdam Vallon ( 639622 ) <amsterdamvallon2003@yahoo.com> on Thursday January 09, 2003 @01:10PM (#5048174) Homepage
    ... to hear from a programmer who still has a job!
  • Sarcasm (Score:5, Funny)

    by Natchswing ( 588534 ) on Thursday January 09, 2003 @01:11PM (#5048184)
    --
    6. Free software
    By Natchswing

    Since your software is so successful, have you thought about charging money for it?
    --

    Actually, that was meant to be a sarcastic joke aimed at making a few people laugh, not a serious question that actually got sent.

    Successful free software... charge money for it...

    *sigh*

  • by Anonymous Coward on Thursday January 09, 2003 @01:31PM (#5048311)
    That was definately a good interview. It made me all warm and fuzzy inside to know that somewhere in the distance are a group of people, probably gnomes, who work on software projects and don't require payment. Just the love of the game.

    If only I could find someone to do this in real life for my house. Cooking, cleaning up, etc. I tried to get someone to do it once but the cops eventually broke it up and said slavery is illegal. Nobody WANTS to do that shit I guess, even my wife.

  • by PunchMonkey ( 261983 ) on Thursday January 09, 2003 @01:46PM (#5048420) Homepage
    Most people consider money a part of success.

    Oh my god.... I've never realized it, but that explains a lot. You see, for the past 10 years, I've been investing thousands upon thousands of dollars into these two kids. They're cute and all, but the little buggers have never made me a *dime*.

    I think it's time I ditch this project, because from a financial standpoint, it certainly isn't successful.

    If you hear a story about the bodies of a 6 year old girl and 9 year old boy being found in the woods of Northern Maine, just do me a favour and keep quiet, ok? The "man" still thinks that there's more to kids than financial reward. Pffft.
  • by Dannon ( 142147 ) on Thursday January 09, 2003 @02:14PM (#5048602) Journal
    for the past 10 years

    There's your problem! Your investment hasn't reached maturity yet! Kids are proven to be a net loss investment over 10- or 20-year periods. If your investment is self-sustaining in less than 25 years, you're doing good. By the time this investment reaches the 40- or 50-year mark, and you're ready to retire, it should be able to support you comfortably. ;-)

Genetics explains why you look like your father, and if you don't, why you should.

Working...