Debugging 290

Posted by timothy on Tuesday February 24, 2004 @03:41PM from the unlousy dept.

dwheeler writes "It's not often you find a classic, but I think I've found a new classic for software and computer hardware developers. It's David J. Agan's Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems." Read on for the rest.

Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems
author	David J. Agans
pages	192
publisher	Amacom
rating	9
reviewer	David A. Wheeler
ISBN	0814471684
summary	A classic book on debugging principles

Debugging explains the fundamentals of finding and fixing bugs (once a bug has been detected), rather than any particular technology. It's best for developers who are novices or who are only moderately experienced, but even old pros will find helpful reminders of things they know they should do but forget in the rush of the moment. This book will help you fix those inevitable bugs, particularly if you're not a pro at debugging. It's hard to bottle experience; this book does a good job. This is a book I expect to find useful many, many, years from now.

The entire book revolves around the "nine rules." After the typical introduction and list of the rules, there's one chapter for each rule. Each of these chapters describes the rule, explains why it's a rule, and includes several "sub-rules" that explain how to apply the rule. Most importantly, there are lots of "war stories" that are both fun to read and good illustrations of how to put the rule into practice.

Since the whole book revolves around the nine rules, it might help to understand the book by skimming the rules and their sub-rules:

Understand the system: Read the manual, read everything in depth, know the fundamentals, know the road map, understand your tools, and look up the details.
Make it fail: Do it again, start at the beginning, stimulate the failure, don't simulate the failure, find the uncontrolled condition that makes it intermittent, record everything and find the signature of intermittent bugs, don't trust statistics too much, know that "that" can happen, and never throw away a debugging tool.
Quit thinking and look (get data first, don't just do complicated repairs based on guessing): See the failure, see the details, build instrumentation in, add instrumentation on, don't be afraid to dive in, watch out for Heisenberg, and guess only to focus the search.
Divide and conquer: Narrow the search with successive approximation, get the range, determine which side of the bug you're on, use easy-to-spot test patterns, start with the bad, fix the bugs you know about, and fix the noise first.
Change one thing at a time: Isolate the key factor, grab the brass bar with both hands (understand what's wrong before fixing), change one test at a time, compare it with a good one, and determine what you changed since the last time it worked.
Keep an audit trail: Write down what you did in what order and what happened as a result, understand that any detail could be the important one, correlate events, understand that audit trails for design are also good for testing, and write it down!
Check the plug: Question your assumptions, start at the beginning, and test the tool.
Get a fresh view: Ask for fresh insights, tap expertise, listen to the voice of experience, know that help is all around you, don't be proud, report symptoms (not theories), and realize that you don't have to be sure.
If you didn't fix it, it ain't fixed: Check that it's really fixed, check that it's really your fix that fixed it, know that it never just goes away by itself, fix the cause, and fix the process.

This list by itself looks dry, but the detailed explanations and war stories make the entire book come alive. Many of the war stories jump deeply into technical details; some might find the details overwhelming, but I found that they were excellent in helping the principles come alive in a practical way. Many war stories were about obsolete technology, but since the principle is the point that isn't a problem. Not all the war stories are about computing; there's a funny story involving house wiring, for example. But if you don't know anything about computer hardware and software, you won't be able to follow many of the examples.

After detailed explanations of the rules, the rest of the book has a single story showing all the rules in action, a set of "easy exercises for the reader," tips for help desks, and closing remarks.

There are lots of good points here. One that particularly stands out is "quit thinking and look." Too many try to "fix" things based on a guess instead of gathering and observing data to prove or disprove a hypothesis. Another principle that stands out is "if you didn't fix it, it ain't fixed;" there are several vendors I'd like to give that advice to. The whole "stimulate the failure, don't simulate the failure" discussion is not as clearly explained as most of the book, but it's a valid point worth understanding.

I particularly appreciated Agans' discussions on intermittent problems (particularly in "Make it Fail"). Intermittent problems are usually the hardest to deal with, and the author gives straightforward advice on how to deal with them. One odd thing is that although he mentions Heisenberg, he never mentions the term "Heisenbug," a common jargon term in software development (a Heisenbug is a bug that disappears or alters its behavior when one attempts to probe or isolate it). At least a note would've been appropriate.

The back cover includes a number of endorsements, including one from somebody named Rob Malda. But don't worry, the book's good anyway :-).

It's important to note that this is a book on fundamentals, and different than most other books related to debugging. There are many other books on debugging, such as Richard Stallman et al's Debugging with GDB: The GNU Source-Level Debugger. But these other texts usually concentrate primarily on a specific technology and/or on explaining tool commands. A few (like Norman Matloff's guide to faster, less-frustrating debugging ) have a few more general suggestions on debugging, but are nothing like Agans' book. There are many books on testing, like Boris Beizer's Software Testing Techniques, but they tend to emphasize how to create tests to detect bugs, and less on how to fix a bug once it's been detected. Agans' book concentrates on the big picture on debugging; these other books are complementary to it.

Debugging has an accompanying website at debuggingrules.com, where you can find various little extras and links to related information. In particular, the website has an amusing poster of the nine rules you can download and print.

No book's perfect, so here are my gripes and wishes:

The sub-rules are really important for understanding the rules, but there's no "master list" in the book or website that shows all the rules and sub-rules on one page. The end of the chapter about a given rule summarizes the sub-rules for that one rule, but it'd sure be easier to have them all in one place. So, print out the list of sub-rules above after you've read the book.
The book left me wishing for more detailed suggestions about specific common technology. This is probably unfair, since the author is trying to give timeless advice rather than a "how to use tool X" tutorial. But it'd be very useful to give good general advice, specific suggestions, and examples of what approaches to take for common types of tools (like symbolic debuggers, digital logic probes, etc.), specific widely-used tools (like ddd on gdb), and common problems. Even after the specific tools are gone, such advice can help you use later ones. A little of this is hinted at in the "know your tools" section, but I'd like to have seen much more of it. Vendors often crow about what their tools can do, but rarely explain their weaknesses or how to apply them in a broader context.
There's probably a need for another book that takes the same rules, but broadens them to solving arbitrary problems. Frankly, the rules apply to many situations beyond computing, but the war stories are far too technical for the non-computer person to understand.

But as you can tell, I think this is a great book. In some sense, what it says is "obvious," but it's only obvious as all fundamentals are obvious. Many sports teams know the fundamentals, but fail to consistently apply them - and fail because of it. Novices need to learn the fundamentals, and pros need occasional reminders of them; this book is a good way to learn or be reminded of them. Get this book.

If you like this review, feel free to see Wheeler's home page, including his book on developing secure programs and his paper on quantitative analysis of open source software / Free Software. You can purchase Debugging: The 9 Indispensable Rules for Finding Even the Most Elusive Software and Hardware Problems from bn.com. Slashdot welcomes readers' book reviews -- to see your own review here, read the book review guidelines, then visit the submission page.

Debugging

This discussion has been archived. No new comments can be posted.

Search 290 Comments Log In/Create an Account

Comments Filter:

My Favorite Debugging Tale (Score:2, Interesting)

by stuffduff ( 681819 ) writes: on Tuesday February 24, 2004 @03:52PM (#8376882) Journal

Soul of a New Machine [amazon.com] by Tracy Kidder [bookbrowse.com] (book teaser) [businessweek.com] My favorite chapter was The Case Of The Missing NAND Gate.

Sounds interesting (Score:5, Interesting)

by pcraven ( 191172 ) writes: <paul@@@cravenfamily...com> on Tuesday February 24, 2004 @03:53PM (#8376907) Homepage

Teaching people how to debug isn't that easy. It requires some experience before they get the hang of it.

I'm a stickler for labeling code often, and tracking changes released to production. Because of this, I often seem to be a stick in the mud when it comes to refactoring.

Heavy refactoring makes your code nicer. But when you have to do a lot of debugging on something that worked be refactoring, you can start to appreciate that keeping the change set managable is a 'good thing'. (I do financial apps, so this may not work for everyone.)

The things I see people fail at most is the ability to 'bracket' the problem. Go between code that works and doesn't work, filtering the problem down to something simple.

The second thing is the inability of some people to go 'deep' in their debugging. Decompile the java/C#/whatever code, trace through the library calls, whatever.

Its nice to see another good book on the market that seems to cover these topics.

Number one (Score:2, Interesting)

by Jooly Rodney ( 100912 ) writes: on Tuesday February 24, 2004 @03:58PM (#8376977)

Okay, haven't read the book, and I guess dhweeler is distilling the rules down to a soundbyte, but isn't #1 the most important and difficult part of debugging? I mean, if I knew system Foo ver. Bar had such-and-such an idiosyncrasy, I could code around it, but Googling for hours to find the one message board post that lets you Understand The System can be aneurysm-inducing. It's not even always the idiosyncrasies of a system -- the sheer volume of stuff you have to learn about I/O conventions, operating systems, etc., in order to write a useful program in a non-toy language boggles the mind. I'm surprised people are able write programs in the first place.

Sonuvabitch! (Score:3, Interesting)

by Anonymous Coward writes: on Tuesday February 24, 2004 @03:59PM (#8376985)

Like 15 years ago in my intro CSE class my first Fortran program which found "edges" in a text file filled with numbers did this. Everything looked good. It would compile. But wouldn't print out its little thing. So I instert statements to print out status of where it is, and it works! I take out the statements and it doesn't. In/out in/out. SO I go ask the TA for help. He says its one of the damndest things he's seen, sorry, Fortran isn't something he's really an expert at.

I have hated fortran for years, having written a single program in it, based on this.

Re: Heisenbugs... (Score:4, Interesting)

by gidds ( 56397 ) writes: <slashdot@ g i d d s.me.uk> on Tuesday February 24, 2004 @04:01PM (#8376998) Homepage

You're describing bugs which are reproducible, but only on the unchanged code.
Worse even that those are bugs which aren't reproducible at all, where there's no way to determine the conditions that caused them, or be sure you've fixed them. The only way to handle them is to fill the code with assertions and defensive code, and hope that at some point it'll catch something for you...

Re:Hardware *Debugging*? (Score:2, Interesting)

by Anonymous Coward writes: on Tuesday February 24, 2004 @04:06PM (#8377063)

Besides being highly apocryphal - that was the first use of the word bug in context of computing. It is not the first hardware bug by a long shot. Actually you would have known that if you actually read the page you linked to.

Re:Top 10 Rules of Debugging (Score:3, Interesting)

by kooso ( 699340 ) writes: on Tuesday February 24, 2004 @04:08PM (#8377091)

10. Code is _always_ Beta. It's never done until it's no longer in use or support no longer exists.

What about the opposite. Anyone against versioning? Tried and failed in Google to find an "Against versioning" campagin. I mean, somebody must be out there who only wants version 1.0 for all software.

I guess the issue is in the meaning we attach to version numbers. What about a program as a well-specified function that, once is implemented (at least for a fixed platform) needs no "enhancements"?

(E.g. Don Knuth adds a digit to each version of TeX, implying that he doesn't plan to add anything substantial, or else he'll be running into very long version numbers).

Re:Heisenbugs... (Score:4, Interesting)

by pclminion ( 145572 ) writes: on Tuesday February 24, 2004 @04:11PM (#8377128)

In my operating system class my groups' program caused an error at one of the delete[] statements and it dissappeared and reappeared depending on whether we ran it in the debug environment or not.
I'll tell you with 99% certainty that this was caused by a piece of code overrunning the end (or beginning) of a new[]'d buffer, clobbering the memory allocation meta-data. This causes delete[] to crump when it hits a bogus pointer and flies off into never never land.
By running in the debug environment you changed the memory layout of the allocation in such a way that the problem was masked.
These kinds of bugs only seem weird the first time you encounter them. They're actually some of the most common types of bugs. With enough experience you'll be finding them in your sleep.

Re:Heisenbugs... (Score:3, Interesting)

by morcheeba ( 260908 ) * writes: on Tuesday February 24, 2004 @04:13PM (#8377152) Journal

that's funny... I just tracked one of these down that existed in our software - the optimized version ran differently than the non-optimized version. It turns out the bounds checker is in the non-optimized version, and a couple of places in the code used x[rand()]=y ... the bounds-checker (implemented as a macro which had side effects) *caused* the heisenbug!

Now That It's Written Down (Score:5, Interesting)

by severoon ( 536737 ) writes: on Tuesday February 24, 2004 @04:13PM (#8377164) Journal

Well, even though I think most people 'round these parts would agree with me that the book covers the fairly obvious, I will say this: it's absolutely necessary to have an "expert" write these things down because all too often, us developers try to proceed and get blocked by management. At my last job, we had a big problem with WebLogic transaction management, some bizarre confluence of events was causing a HeuristicMixedException to be thrown by the platform--by the way, WebLogic people, thanks a lot for naming this exception this way and taking the time to make sure it gets thrown in no less than six totally unrelated (as far as I can tell) circumstances. I love it when exceptions originate ambiguously, from several sources, and no one part of the platform has authority over the problem.

This was a big enough problem that we had to set up a separate, isolated environment to figure out what was going on. 4 out of the 5 architects involved on the project (no it wasn't a huge project--you can see HME wasn't the only problem here) had cemented ideas about what was going wrong...none of them agreed of course...and we had no less than 3 managers with theories based on the idea that the Earth sweeps through an aether against which all things can be measured.

The biggest issue with this testing environment was keeping everyone's mitts off of it, especially those people who didn't have to ask for permissions to the system (the architects, managers...in other words everyone). And the managers didn't agree that it was particularly important to record every step methodically, or limit the number of people making changes to the system to 1 at a time. Instead, they set up a war room and engaged in what I like to call: Fix By Chaotic Typing. (It's chaotic in the sense that, there are definitely patterns to the activity taking place, but you have to be Stephen Wolfram to find and understand them.)

Needless to say, that didn't work. If I'd had access to this book, an authority willing to put the obvious in print might have bolstered my argument that we needed to take resources OFF this issue, not add more. Alas, it was not to be. The bigwigs decided that, since the current manpower wasn't able to track down this bug, it was time to bring in the high-priced WebLogic consultants. We got some 1st generation WebLogic people, 3 of them eventually, and they came in and immediately set themselves to the task of learning our business, telecommunications. And at a mere $150/hour, why not? (Management decided the bug was non-deterministic at some point and this assembly of people was given the informal team moniker: the Heuristics team. I preferred "the Histrionics team".)

So I eventually teamed up with the lead architect on the project and we solved the problem by subterfuge. We had to intentionally set these people working in a direction--everyone, employees and WebLogic consultants alike--that was so off-the-track they actually didn't interfere with any part of the system likely containing the error. This gave us a reasonable amount of time and space to track down the bug in 3 days' time. At only the loss of 6 weeks and several thousand dollars in expenses alone for the WL consultants.

sev

Re:Time (Score:3, Interesting)

by Dukael_Mikakis ( 686324 ) writes: <andrewfoersterNO@SPAMgmail.com> on Tuesday February 24, 2004 @04:17PM (#8377216)

Yeah, the sad truth seems to be that when prioritizing general and regression testing seems to rank low on the list because it doesn't actually create new product (though it is of course necessary, we aren't selling our testing, we're selling our new code).

With marketers and product managers and sales people all pushing our product and making wild promises about delivery dates and patch dates it becomes a fruitless effort to keep on top of the regression testing, and I've found that with the software at my company, it's sort of ramped up until it'll reach a breaking point where we'll just need to scrap big portions of our system and release a whole new build, likely using "Buzzwords" or cryptic acronyms that are supposed to indicate progress.

... and it doesn't help that a big chunk of our source code was recently leaked [slashdot.org].

Re:Remain focused. Don't let others' WAGs get to y (Score:5, Interesting)

by RobinH ( 124750 ) writes: on Tuesday February 24, 2004 @04:30PM (#8377358) Homepage

I find that when troubleshooting systems with which other people have worked longer, I have had better luck just asking them simple facts and troubleshooting myself rather than listening to their wild-ass guesses and having to shoot them down.

Yes, but within their guesses are sometimes tidbits of information. Last week we had a complaint from a user that every time they clicked this one button on a form, it set off a certain process that wasn't supposed to happen right then, but we knew that there was no connection between that click event and the process. However, I knew he wasn't imagining it.

After investigating, I found that when he opened the form that the button was on, it loaded a timer object that started ticking away, and 5 seconds later initiated the process. Just happens that it takes about 5 seconds from opening the form to click on the button.

Of course, if I'd written the software... well, whatever.

Re:Change one thing at a time (Score:5, Interesting)

by wrp103 ( 583277 ) writes: <Bill@BillPringle.com> on Tuesday February 24, 2004 @04:38PM (#8377446) Homepage
It is nice to see a book that addresses this topic. I get very frustrated with so many text books that have at most a small chapter on debugging. Let's face it, beginning programmers spend more time debugging code than they do writing code, so why isn't that activity stressed?

I particularly liked the rule about "Quit thinking and look". I worked with a guy who used what I call the "Zen method of debugging". He would keep staring at the code, trying to determine what was going on. I, on the other hand, would throw in some print statements so I could see what was going on. In one case, he insisted there was nothing wrong with the code, but what he didn't realize was that an early test failed, which meant the code he was looking at never got executed. I had suggested he print something out at the start of the routine, but he insisted it wasn't necessary because he knew what it was doing.
He might cover this in the book, but one rule that I stress with my students is, if you make a change and the behavior of the program is the same, back out your changes because either:
- You are probably looking in the wrong place (which is why the behavior is the same)
- You could easily have just inserted several new bugs that you won't see until the path you are looking at gets executed.
I often have students insist that their changes should have fixed something, but it turns out the program was actually executing an alternative path that they weren't looking at, or that the problem was much earlier, so when it got to where they thought the problem was, the data was different than they assumed.
casting the runes has worked! (Jargon file) (Score:5, Interesting)

by dwheeler ( 321049 ) writes: on Tuesday February 24, 2004 @04:39PM (#8377454) Homepage Journal

Indeed, casting the runes [catb.org] has been a successful debugging technique before.
Here's the story from the Jargon File (under "casting the runes"): "A correspondent from England tells us that one of ICL's most talented systems designers used to be called out occasionally to service machines which the field circus had given up on. Since he knew the design inside out, he could often find faults simply by listening to a quick outline of the symptoms. He used to play on this by going to some site where the field circus had just spent the last two weeks solid trying to find a fault, and spreading a diagram of the system out on a table top. He'd then shake some chicken bones and cast them over the diagram, peer at the bones intently for a minute, and then tell them that a certain module needed replacing. The system would start working again immediately upon the replacement."

Re:Heisenbugs... (Score:2, Interesting)

by badmammajamma ( 171260 ) writes: on Tuesday February 24, 2004 @04:57PM (#8377629)

In my experience they are usually caused by inadequate protection of data that needs to be thread safe. In fact, I've never had one from a buffer overflow.

Of course this brings up the point of don't make assumptions that "this" has to be the problem. The bugs that take the longest time to debug are the ones where we build a false premise in our head of where the problem is or what the state of things are up to the point of the problem.

Assume nothing.

Re:Heisenbugs... (Score:3, Interesting)

by jimsum ( 587942 ) writes: on Tuesday February 24, 2004 @04:58PM (#8377643)

Heisenbugs are also caused by floppy disks. We once shipped a program where one bit was wrong on the floppy, which caused a nasty bug. That one was hard to duplicate until we got the customer to ship us their version of the program.

Re:Heisenbugs... (Score:5, Interesting)

by Rufus88 ( 748752 ) writes: on Tuesday February 24, 2004 @05:13PM (#8377808)

In my experience, Heisenbugs are often the result of race conditions between concurrent threads.

This reminds me of a famous hardware "bug":
> This is a weird but true story (with a moral) ...
> A complaint was received by the Pontiac Division of General Motors:
>
> "This is the second time I have written you, and I don't blame you for not
> answering me, because I kind of sounded crazy, but it is a fact that we
> have a tradition in our family of ice cream for dessert after dinner each
> night.
>
> But the kind of ice cream varies so, every night, after we've eaten, the
> whole family votes on which kind of ice cream we should have and I drive
> down to the store to get it. It's also a fact that I recently purchased a
> new Pontiac and since then my trips to the store have created a problem.
>
> You see, every time I buy vanilla ice cream, when I start back from the
> store my car won't start. If I get any other kind of ice cream, the car
> starts just fine. I want you to know I'm serious about this question, no
> matter how silly it sounds: 'What is there about a Pontiac that makes it
> not start when I get vanilla ice cream, and easy to start whenever I get any
> other kind?'"
>
> The Pontiac President was understandably skeptical about the letter, but
> sent an engineer to check it out anyway. The latter was surprised to be
> greeted by a successful, obviously well educated man in a fine neighborhood.
>
> He had arranged to meet the man just after dinner time, so the two hopped
> into the car and drove to the ice cream store. It was vanilla ice cream
> that night and, sure enough, after they came back to the car, it wouldn't
> start.
>
> The engineer returned for three more nights. The first night, the man got
> chocolate. The car started. The second night, he got strawberry. The car
> started. The third night he ordered vanilla. The car failed to start.
>
> Now the engineer, being a logical man, refused to believe that this man's
> car was allergic to vanilla ice cream. He arranged, therefore, to continue
> his visits for as long as it took to solve the problem. And toward this end
> he began to take notes: he jotted down all sorts of data, time of day, type
> of gas used, time to drive back and forth, etc.
>
> In a short time, he had a clue: the man took less time to buy vanilla than
> any other flavor. Why? The answer was in the layout of the store.
>
> Vanilla, being the most popular flavor, was in a separate case at the front
> of the store for quick pickup. All the other flavors were kept in the back
> of the store at a different counter where it took considerably longer to
> find the flavor and get checked out.
>
> Now the question for the engineer was why the car wouldn't start when it
> took less time. Once time became the problem-not the vanilla ice cream-the
> engineer quickly came up with the answer: vapor lock. It was happening
> every night, but the extra time taken to get the other flavors allowed the
> engine to cool down sufficiently to start. When the man got vanilla, the
> engine was still too hot for the vapor lock to dissipate.
>
> Moral of the story: even insane looking problems are sometimes real.

Re:Top 10 Rules of Debugging (Score:2, Interesting)

by Zangief ( 461457 ) writes: on Tuesday February 24, 2004 @05:16PM (#8377846) Homepage Journal

Rule 0. If you are programming in C, or similar, start counting from zero.

Re:Missed one: explain it to someone (Score:5, Interesting)

by Speare ( 84249 ) writes: on Tuesday February 24, 2004 @05:22PM (#8377920) Homepage Journal

No, that's a funny thing. I drew that bear icon over ten years ago when I was on the Win3.1 shell team. I didn't even know it still shipped in any MSFT product.
The teddy bear is named Bear, and was the cuddly companion of one of the Windows 3.1 / Windows 95 shell team developers. He'd carry it *EVERYWHERE*. There are quite a few internal APIs called BunnyThis() or BearThat(), usually with generic numbers, because giving it a name would entice application writers to try to call it. (They're useless three-line internal helpers, but that didn't stop conspiratorial book-writers from trying to document them anyway.)
Bear also appears in the Win3.1 credits, where I made portraits of spectacled Bill, bald Steve, and large-schnozzed Brad Silverberg.
Now I don't have any Microsoft products at my house, anymore, except one outdated off-net machine which runs edutainment CD-ROMs for my daughter.

Phase of the Moon (Score:4, Interesting)

by cpeterso ( 19082 ) writes: on Tuesday February 24, 2004 @05:51PM (#8378298) Homepage

There really was a bug based on the phase of the moon. See the Jargon Dictionary for more info: phase of the moon [astrian.net]:

phase of the moon phase of the moon n. Used humorously as a random parameter on which something is said to depend. Sometimes implies unreliability of whatever is dependent, or that reliability seems to be dependent on conditions nobody has been able to determine. "This feature depends on having the channel open in mumble mode, having the foo switch set, and on the phase of the moon." See also heisenbug. True story: Once upon a time there was a program bug that really did depend on the phase of the moon. There was a little subroutine that had traditionally been used in various programs at MIT to calculate an approximation to the moon's true phase. GLS incorporated this routine into a LISP program that, when it wrote out a file, would print a timestamp line almost 80 characters long. Very occasionally the first line of the message would be too long and would overflow onto the next line, and when the file was later read back in the program would barf. The length of the first line depended on both the precise date and time and the length of the phase specification when the timestamp was printed, and so the bug literally depended on the phase of the moon! The first paper edition of the Jargon File (Steele-1983) included an example of one of the timestamp lines that exhibited this bug, but the typesetter `corrected' it. This has since been described as the phase-of-the-moon-bug bug. However, beware of assumptions. A few years ago, engineers of CERN (European Center for Nuclear Research) were baffled by some errors in experiments conducted with the LEP particle accelerator. As the formidable amount of data generated by such devices is heavily processed by computers before being seen by humans, many people suggested the software was somehow sensitive to the phase of the moon. A few desperate engineers discovered the truth; the error turned out to be the result of a tiny change in the geometry of the 27km circumference ring, physically caused by the deformation of the Earth by the passage of the Moon! This story has entered physics folklore as a Newtonian vengeance on particle physics and as an example of the relevance of the simplest and oldest physical laws to the most modern science.

Re:Change one thing at a time (Score:4, Interesting)

by CargoCultCoder ( 228910 ) writes: on Tuesday February 24, 2004 @06:01PM (#8378455) Homepage

I particularly liked the rule about "Quit thinking and look". I worked with a guy who used what I call the "Zen method of debugging". He would keep staring at the code, trying to determine what was going on...

Personally, I would consider this to be the anti-Zen method. He was apparently focused so much on what he "knew" to be true, that he failed to consider clues trying to point him in another direction. That is not the Zen way of looking at things.

Zen and the Art of Motorcycle Maintenance [amazon.com] has a lot to say about this. If you're stuck on a problem, the solution is not to beat on it harder (e.g., stare at the code some more). The solution is to back off, and (to paraphrase from memory) allow yourself to become aware of the one little fact that's out there, waving it's hand, hoping that you might notice it ... and that'll point you at the real problem.

Stupidly staring at code is not Zen. Having an open mind for interesting and helpful facts -- whatever their source -- is.

Re:That's only partly true (Score:2, Interesting)

by mark99 ( 459508 ) writes: on Tuesday February 24, 2004 @06:32PM (#8378843) Journal

Lots of point there to answer...

Tests of the 10000 test case sort should be automated, then boredom is not an issue.

I've written and maintained quite a few largeish (>10000 line) programs in my life, and those with extensive automated test suites (all right, there were only two of those :), were much nicer.

By the time a program gets large (and sucessful ones usually do (it is amazing how many unsucessful ones do too)), the way the "pieces work together", and "optimal architectures" are no longer an issue. I like Fowlers statement "Architecture is about decisions that are hard to change later".

Brand new shiny programs based on brand new shiny ideas are always more interesting than the old ones, just like everything else in life. But by the time they have a "complete feature set", they are old and wrinkly.

Admittedly Regression Testing is often not an option. There usually seem to be tests that are hard to automate, and they just don't get automated. And bugs occur there (maybe mostly there). I haven't seen much in the way of Automated Regression Testing, except in the NUnit project.

A computer wipe out (Score:2, Interesting)

by CactusCritter ( 182409 ) writes: on Tuesday February 24, 2004 @09:11PM (#8380746)

This is a very ancient experience which occured on an IBM704 noticeably before there was any such thing as an operating system.

I had written an application which, when run, wiped 32K of RAM words clean with the same image in every word. I had to get after-hours computer time and proceeded to insert a print statement following by an abort in the source code (Fortran) in order to track the problem down.

Turned out to be an input error. The input utility I was employing used the character in column one of the data cards to indicate the type of data.

I finally determined that my assistant, who had prepared the input data for my test case had put a 3 instead of a 4 in the first column of a data card resulting in what would have been a quite reasonable integer value becoming a floating point number in the machine.

That resulted in a very long-running loop which stored the same number in every word of the machines RAM.

Thereafter, every appplication that I wrote had an input data check for every integer value to ensure that its value did not exceed the source code dimension of the array into which data were to be written.

This was Fortran around 1960. The Ur days!

Give me a break (Score:2, Interesting)

by rumblin'rabbit ( 711865 ) writes: on Tuesday February 24, 2004 @09:27PM (#8380876) Journal

Yeah, right. Perhaps if you are debugging a small application you wrote yourself then it may be that easy. Debugging a major application written by someone else, who used perhaps a less than optimum style and methodology? Forget it. The above won't work because you won't be able to figure out #1 and you won't be able to figure out #2 in less time than it takes to rewrite the application from scratch.
Don't know what planet you're programming on.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Debugging 290

Debugging More Login

Debugging

My Favorite Debugging Tale (Score:2, Interesting)

Sounds interesting (Score:5, Interesting)

Number one (Score:2, Interesting)

Sonuvabitch! (Score:3, Interesting)

Re: Heisenbugs... (Score:4, Interesting)

Re:Hardware Debugging? (Score:2, Interesting)

Re:Top 10 Rules of Debugging (Score:3, Interesting)

Re:Heisenbugs... (Score:4, Interesting)

Re:Heisenbugs... (Score:3, Interesting)

Now That It's Written Down (Score:5, Interesting)

Re:Time (Score:3, Interesting)

Re:Remain focused. Don't let others' WAGs get to y (Score:5, Interesting)

Re:Change one thing at a time (Score:5, Interesting)

casting the runes has worked! (Jargon file) (Score:5, Interesting)

Re:Heisenbugs... (Score:2, Interesting)

Re:Heisenbugs... (Score:3, Interesting)

Re:Heisenbugs... (Score:5, Interesting)

Re:Top 10 Rules of Debugging (Score:2, Interesting)

Re:Missed one: explain it to someone (Score:5, Interesting)

Phase of the Moon (Score:4, Interesting)

Re:Change one thing at a time (Score:4, Interesting)

Re:That's only partly true (Score:2, Interesting)

A computer wipe out (Score:2, Interesting)

Give me a break (Score:2, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot

My Favorite Debugging Tale (Score:2, Interesting)

Sounds interesting (Score:5, Interesting)

Number one (Score:2, Interesting)

Sonuvabitch! (Score:3, Interesting)

Re: Heisenbugs... (Score:4, Interesting)

Re:Hardware *Debugging*? (Score:2, Interesting)

Re:Top 10 Rules of Debugging (Score:3, Interesting)

Re:Heisenbugs... (Score:4, Interesting)

Re:Heisenbugs... (Score:3, Interesting)

Now That It's Written Down (Score:5, Interesting)

Re:Time (Score:3, Interesting)

Re:Remain focused. Don't let others' WAGs get to y (Score:5, Interesting)

Re:Change one thing at a time (Score:5, Interesting)

casting the runes has worked! (Jargon file) (Score:5, Interesting)

Re:Heisenbugs... (Score:2, Interesting)

Re:Heisenbugs... (Score:3, Interesting)

Re:Heisenbugs... (Score:5, Interesting)

Re:Top 10 Rules of Debugging (Score:2, Interesting)

Re:Missed one: explain it to someone (Score:5, Interesting)

Phase of the Moon (Score:4, Interesting)

Re:Change one thing at a time (Score:4, Interesting)

Re:That's only partly true (Score:2, Interesting)

A computer wipe out (Score:2, Interesting)

Give me a break (Score:2, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Re:Hardware Debugging? (Score:2, Interesting)