Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Programming Books Media Book Reviews IT Technology

Debugging 290

dwheeler writes "It's not often you find a classic, but I think I've found a new classic for software and computer hardware developers. It's David J. Agan's Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems." Read on for the rest.
Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems
author David J. Agans
pages 192
publisher Amacom
rating 9
reviewer David A. Wheeler
ISBN 0814471684
summary A classic book on debugging principles

Debugging explains the fundamentals of finding and fixing bugs (once a bug has been detected), rather than any particular technology. It's best for developers who are novices or who are only moderately experienced, but even old pros will find helpful reminders of things they know they should do but forget in the rush of the moment. This book will help you fix those inevitable bugs, particularly if you're not a pro at debugging. It's hard to bottle experience; this book does a good job. This is a book I expect to find useful many, many, years from now.

The entire book revolves around the "nine rules." After the typical introduction and list of the rules, there's one chapter for each rule. Each of these chapters describes the rule, explains why it's a rule, and includes several "sub-rules" that explain how to apply the rule. Most importantly, there are lots of "war stories" that are both fun to read and good illustrations of how to put the rule into practice.

Since the whole book revolves around the nine rules, it might help to understand the book by skimming the rules and their sub-rules:

  1. Understand the system: Read the manual, read everything in depth, know the fundamentals, know the road map, understand your tools, and look up the details.
  2. Make it fail: Do it again, start at the beginning, stimulate the failure, don't simulate the failure, find the uncontrolled condition that makes it intermittent, record everything and find the signature of intermittent bugs, don't trust statistics too much, know that "that" can happen, and never throw away a debugging tool.
  3. Quit thinking and look (get data first, don't just do complicated repairs based on guessing): See the failure, see the details, build instrumentation in, add instrumentation on, don't be afraid to dive in, watch out for Heisenberg, and guess only to focus the search.
  4. Divide and conquer: Narrow the search with successive approximation, get the range, determine which side of the bug you're on, use easy-to-spot test patterns, start with the bad, fix the bugs you know about, and fix the noise first.
  5. Change one thing at a time: Isolate the key factor, grab the brass bar with both hands (understand what's wrong before fixing), change one test at a time, compare it with a good one, and determine what you changed since the last time it worked.
  6. Keep an audit trail: Write down what you did in what order and what happened as a result, understand that any detail could be the important one, correlate events, understand that audit trails for design are also good for testing, and write it down!
  7. Check the plug: Question your assumptions, start at the beginning, and test the tool.
  8. Get a fresh view: Ask for fresh insights, tap expertise, listen to the voice of experience, know that help is all around you, don't be proud, report symptoms (not theories), and realize that you don't have to be sure.
  9. If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself, fix the cause, and fix the process.

This list by itself looks dry, but the detailed explanations and war stories make the entire book come alive. Many of the war stories jump deeply into technical details; some might find the details overwhelming, but I found that they were excellent in helping the principles come alive in a practical way. Many war stories were about obsolete technology, but since the principle is the point that isn't a problem. Not all the war stories are about computing; there's a funny story involving house wiring, for example. But if you don't know anything about computer hardware and software, you won't be able to follow many of the examples.

After detailed explanations of the rules, the rest of the book has a single story showing all the rules in action, a set of "easy exercises for the reader," tips for help desks, and closing remarks.

There are lots of good points here. One that particularly stands out is "quit thinking and look." Too many try to "fix" things based on a guess instead of gathering and observing data to prove or disprove a hypothesis. Another principle that stands out is "if you didn't fix it, it ain't fixed;" there are several vendors I'd like to give that advice to. The whole "stimulate the failure, don't simulate the failure" discussion is not as clearly explained as most of the book, but it's a valid point worth understanding.

I particularly appreciated Agans' discussions on intermittent problems (particularly in "Make it Fail"). Intermittent problems are usually the hardest to deal with, and the author gives straightforward advice on how to deal with them. One odd thing is that although he mentions Heisenberg, he never mentions the term "Heisenbug," a common jargon term in software development (a Heisenbug is a bug that disappears or alters its behavior when one attempts to probe or isolate it). At least a note would've been appropriate.

The back cover includes a number of endorsements, including one from somebody named Rob Malda. But don't worry, the book's good anyway :-).

It's important to note that this is a book on fundamentals, and different than most other books related to debugging. There are many other books on debugging, such as Richard Stallman et al's Debugging with GDB: The GNU Source-Level Debugger. But these other texts usually concentrate primarily on a specific technology and/or on explaining tool commands. A few (like Norman Matloff's guide to faster, less-frustrating debugging ) have a few more general suggestions on debugging, but are nothing like Agans' book. There are many books on testing, like Boris Beizer's Software Testing Techniques, but they tend to emphasize how to create tests to detect bugs, and less on how to fix a bug once it's been detected. Agans' book concentrates on the big picture on debugging; these other books are complementary to it.

Debugging has an accompanying website at debuggingrules.com, where you can find various little extras and links to related information. In particular, the website has an amusing poster of the nine rules you can download and print.

No book's perfect, so here are my gripes and wishes:

  1. The sub-rules are really important for understanding the rules, but there's no "master list" in the book or website that shows all the rules and sub-rules on one page. The end of the chapter about a given rule summarizes the sub-rules for that one rule, but it'd sure be easier to have them all in one place. So, print out the list of sub-rules above after you've read the book.
  2. The book left me wishing for more detailed suggestions about specific common technology. This is probably unfair, since the author is trying to give timeless advice rather than a "how to use tool X" tutorial. But it'd be very useful to give good general advice, specific suggestions, and examples of what approaches to take for common types of tools (like symbolic debuggers, digital logic probes, etc.), specific widely-used tools (like ddd on gdb), and common problems. Even after the specific tools are gone, such advice can help you use later ones. A little of this is hinted at in the "know your tools" section, but I'd like to have seen much more of it. Vendors often crow about what their tools can do, but rarely explain their weaknesses or how to apply them in a broader context.
  3. There's probably a need for another book that takes the same rules, but broadens them to solving arbitrary problems. Frankly, the rules apply to many situations beyond computing, but the war stories are far too technical for the non-computer person to understand.

But as you can tell, I think this is a great book. In some sense, what it says is "obvious," but it's only obvious as all fundamentals are obvious. Many sports teams know the fundamentals, but fail to consistently apply them - and fail because of it. Novices need to learn the fundamentals, and pros need occasional reminders of them; this book is a good way to learn or be reminded of them. Get this book.


If you like this review, feel free to see Wheeler's home page, including his book on developing secure programs and his paper on quantitative analysis of open source software / Free Software. You can purchase Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

This discussion has been archived. No new comments can be posted.

Debugging

Comments Filter:
  • by freeze128 ( 544774 ) on Tuesday February 24, 2004 @03:45PM (#8376786)
    I think the term you want is TROUBLESHOOTING.
  • Heisenbugs... (Score:5, Informative)

    by Aardpig ( 622459 ) on Tuesday February 24, 2004 @03:48PM (#8376829)

    ...are always the worst: bugs which disappear when you look for them. Insert a print statement? The bug disappears. Use a debugger? The bug reappears, but in a different place.

    Heisenbugs are almost always caused by buffer overflows. They can often be prevented (at least in Fortran 77/90/95/03) by enabling array-bounds checking at compile time; but before I knew about this, I had a hell of a time tracking them down.

  • I'd agree (Score:5, Informative)

    by scatterbrained ( 144748 ) on Tuesday February 24, 2004 @03:49PM (#8376839) Journal
    I've read it and it's a good book, but I would
    just borrow it from the library and then print
    out the poster to remember the 'rules'.

    There's not enough meat to keep it on my
    precious shelf space.
  • by scatterbrained ( 144748 ) on Tuesday February 24, 2004 @03:52PM (#8376888) Journal
    there's a distinction (in real life) and
    in the book between troubleshooting something
    that's supposed to work (think TV repair) and
    debugging something that's never been made
    before (hardware design).

    Troubleshooting lends itself more to scripted
    debugging, and "real debugging" is a bit more
    free-form

  • by TheCrayfish ( 73892 ) on Tuesday February 24, 2004 @03:54PM (#8376915) Homepage
    You can read a sample chapter from the Debugging Rules book in PDF format by going here [debuggingrules.com]. (Requires the free Adobe reader [adobe.com].)
  • by SamiousHaze ( 212418 ) on Tuesday February 24, 2004 @03:56PM (#8376940)
    Actually,
    the first computer "bug" was a hardware bug, as it was a moth that flew into a relay and jammed it. Removing the bug physically was debugging. http://www.maxmon.com/1945ad.htm is a reference.

    Besides, when you are building a machine and dealing with Logic Gates - its the same type of debugging as with software logic.
  • Re:Sonuvabitch! (Score:4, Informative)

    by Aardpig ( 622459 ) on Tuesday February 24, 2004 @04:07PM (#8377072)

    I have hated fortran for years, having written a single program in it, based on this.

    Fortunately, things have changed a lot since then. With the introduction of modules and array arithmetic in Fortran 90/95, sitations where routines are called with the wrong arguments, or arrays are subscripted incorrectly, are much less frequent. I haven't been bitten by a Heisenbug for a couple of years now; and when I am, switching on checking at compile and run time usually reveals the problem pretty quickly.

  • my review... (Score:2, Informative)

    by chmod_localhost ( 718125 ) on Tuesday February 24, 2004 @04:13PM (#8377153) Journal
    Mr. Agans' book presents real life experiences, or as he calls them war stories and humor filled comment/anecdotes.

    I find myself chuckling and giggling along while reading this book, some of what he said brought back my own memories while working/debugging on my own software bug(s), or other people's bug(s) that I have somehow 'inherited' because they left the company, or are too busy on other projects to debug their own code. I like the metaphors that he uses to explain ideas or concepts that seems a bit too complicated to understand.

    Mr. Agans made this very clear in the beginning of his book; the book is not a cover-it-all book, it is a general concept book on how to isolate, find, and debug something that has gone wrong. The principles presented by Mr. Agans can be applied to situations covering everyday life. He presented examples of well pump and light bulb, etc...

    More experienced software/hardware engineers or more experienced problem solvers who read this book might find it covering bases that they already know, but the humor makes it worth while.
  • by BinBoy ( 164798 ) on Tuesday February 24, 2004 @04:13PM (#8377156) Homepage
    4. Divide and conquer: Narrow the search with successive approximation, get the range, determine which side of the bug you're on, use easy-to-spot test patterns, start with the bad, fix the bugs you know about, and fix the noise first.

    That's a very usueful rule. In nearly 20 years of programming I haven't found any tool or technique that works better than printf / std::cout / MessageBox and logging.

    Logging is especially important if your users aren't conveniently in the same building as you. When a customer has a problem I've never seen before, I usually tell them to run the program with the -log switch and send me the log. Nearly always this leads to the problem and I can fix the bug within minutes.

    Add logging to your app and you'll increase the number of hours you can sleep.

  • Re:Heisenbugs... (Score:3, Informative)

    by JWW ( 79176 ) on Tuesday February 24, 2004 @04:26PM (#8377301)
    Wow, I didn't even know there WAS a Fortran 03.

    I've only ever used 77, but I knew 90 existed, I just thought it must've died off after that (I haven't used Fortran in almost 15 years).

  • Re:Heisenbugs... (Score:3, Informative)

    by Aardpig ( 622459 ) on Tuesday February 24, 2004 @04:37PM (#8377430)

    Wow, I didn't even know there WAS a Fortran 03.

    Strictly speaking, there isn't -- yet. It is currently in draft form, and will be formally released later on this year. Fortran itself is still being used extensively for numerical modelling, since it remains the leader performance-wise for such problems.

  • Re:Heisenbugs... (Score:4, Informative)

    by Marvin_OScribbley ( 50553 ) on Tuesday February 24, 2004 @04:51PM (#8377576) Homepage Journal
    Heisenbugs are almost always caused by buffer overflows.

    In my experience with embedded systems, a Heisenbug is almost always caused by un-initialized data. You wind up assuming a particular value whereas you originally didn't plan on doing that. What value the data actually turns out to be is highly dependant on things like where in memory the code loads, how big the executable is, and so forth. Adding debugging statements will shift all the code after it up in memory and often make the bug go away and behave differently.

    Another interesting bug that is unrelated to the Heisenbug is when you port (for example) ANSI C code from one platform to another and code that originally worked starts doing weird things. For example, the C compiler under a BSD would allow modulo 0 and produce a zero result, which was incidentally what was wanted. Moved the code to Linux and started getting core dumps, because modulo 0 was considered dividing by zero. Some problems like this actually turn out to be Heisenbugs, for example due to differences in the way memory is malloc-ed on different systems. For example, suppose you accidentally malloc a pointer rather then its contents. One one OS you wind up allocating more memory than you need, but have no problems because addresses start fairly low in memory. On another OS memory addresses start somewhere else and you start getting weird errors due to lack of memory.
  • by Anonymous Coward on Tuesday February 24, 2004 @05:21PM (#8377913)

    My favorite quote on the subject of debugging:

    As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.


    -- Maurice Wilkes, 1949

    55 years later, programmers are still spending a large part of their lives finding bugs and fixing them...

  • by Anonymous Coward on Tuesday February 24, 2004 @05:47PM (#8378250)
    Use Purify (or an equivalent) before you deliver your code.

    At about $1200 a seat for WinXX, that's about a single day's worth of productivity from a coder. (I'm counting all overhead here).

    How many times have you spent days or even weeks looking for that one elusive memory overwrite causing your Heisenbug? Memory-checking tools like Purify find those bugs before they cause failures!!!

  • Re:Heisenbugs... (Score:3, Informative)

    by composer777 ( 175489 ) on Tuesday February 24, 2004 @05:59PM (#8378433)
    Buffer overflows tend to be less obvious than passing a pointer to a block of data that is allocated locally outside the scope within which it is created. In fact, I've never seen a bug caused by passing back a pointer to locally allocated data outside of the scope of the block (or function) in which it was created. In other words, stack based Heisenbergs seem easy to avoid. I think that this kind of bug indicates that a programmer is completely clueless about how machine code is generated. However, buffer overflows can be much less obvious, since the size of the buffer can in fact be created at run time, and can be variable. e.g.
    double * d = NULL; /*...later...*/
    d = malloc(i*sizeof(double));
    memset(d, 0x00, i*sizeof(double));
    where i could be anything.
    In this case if i is ten, and j is 11, then the code below could trigger an exception in some cases, but not every case:
    double m = d[j];

    The code above is not necessarily incorrect, it all depends on the values of i and j. However, passing back a pointer to any locally declared variable outside the scope of the block within which is created is always wrong. In fact, you don't need to return it, any method of referencing blocks of memory that are allocated locally outside of their scope is incorrect. The adress of a local variables always points to an address in the local data for that particular block of code, which is (usually) kept as an offset to a frame pointer. This block of memory (known as a stack frame) is deallocated after that particular block of code is left. (Note that passing back a pointer to a block of memory that is malloced inside a function is not incorrect. This uses the heap, not the function frame, to keep track of data).

    So for example:

    int a = 0;
    int *b = NULL;
    int **c = NULL;
    int *d = NULL;
    {
    int x;
    int *y;
    y = malloc(sizeof(double));
    a = x; //this is fine, we're just passing data
    b = y; //this ok too, y's block of memory is /*allocated out of the heap*/
    c = /*WRONG!! y is allocated
    locally, remember, we're talking
    about the address of y, not the
    adress y is pointing to in this
    case, the adress of y, like the adress
    x, is referrencing locally allocated
    data*/
    d = /* this is also wrong, since the adress
    of the data containing x, is pointing
    to memory that is pushed on the
    function stack*/
    }

    If you've been paying attention, you'll notice that any time you see a '&' before a right side variable that this should get your attention. I would wince right away if I saw that. My first instinct is to figure out where that variable is created.

    To read more about the basics of assmebly programming, and machine, which admittedly I'm not an expert on (I do know something about C/C++), you can go here:
    http://www.microsoft.com/msj/0298/hood0298. aspx
  • Re:Good read (Score:3, Informative)

    by monique ( 10006 ) on Tuesday February 24, 2004 @06:06PM (#8378526) Journal
    This is a great example of where version control systems can really save your butt. Even if you *have* changed multiple things, at least you have some idea of what changed between when you started hacking around to find the bug and when you found it.

The one day you'd sell your soul for something, souls are a glut.

Working...