Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming IT Technology

Test Coverage Leading You Astray? 48

An anonymous reader writes "Are your test coverage measurements leading you astray? Test coverage tools bring valuable depth to unit testing, but they're often misused. This article takes a closer look at what the numbers on the coverage report really mean, as well as what they don't. It then suggests three ways you can use your coverage to ensure code quality early and often."
This discussion has been archived. No new comments can be posted.

Test Coverage Leading You Astray?

Comments Filter:
  • Testing? (Score:5, Funny)

    by B4D BE4T ( 879239 ) on Tuesday February 07, 2006 @04:27AM (#14658268)
    Who needs testing? Doesn't everyone's code work perfectly on the first ru
    Segmentation fault
    • Reminds me of a coworkers job on file access in vb6. I think it had something to do with parsing and writing config files...

      When i asked him where his error handling routines were he replied: "I don't program errors!"

      I know, nothing to do with unit testing; but still worth a mention.
      • Obviously your co-worker was a dork when it came to handling environmental issues (file locking, permissions, etc.) but I can see where his attitude would be helpful to some of the programmers I've met. It is far too common in this day of virtual machine environments and structured exception handling for folks to write in an error handler that doesn't do ANYTHING with the error, including propagate it up if it's not mitigated. In other words, many programmers write exception handling code that simply EATS
  • code flow is just as important as code coverage. If code in section 1 is executed in unit test 1, and code in section 2 is executed in unit test 3, there needs to be a unit test which executes both. All combinations have to be handled, if sections of code have side effects on other sections.
    • Not just that but if you have code that is called from different places in a subtly different way. You end up saying "Yep, covered that routine" only to have it go bang when the user accesses the code using an obscure method...

      That would be embarassing ;)

    • But for any reasonable size program the combinations are just so high you can't sit and write unit tests to cover and execute each and every posibility. Either physically or financially.

      What you can do, is use tools like PolySpace (www.polyspace.com) to ensure you won't have any array overruns, out of range errors, access through dangling pointers, etc. You can then run unit tests on the 'working' code in working scenarios to ensure it does what it should.
  • The idea that you can input some values and expect useful output from a function is nice in theory. Perhaps in some very limited mathematics oriented programs where the inputs must lead to a nice answer, but real world applications may end up manipulating more than just the input data.

    Can you test that the LCD has refreshed at the inputted rate? Can you verify that the input data was correctly injected into the database just be checking the output of the function?

    Functions lie like dogs. You can test the
    • by Anonymous Coward
      by BadAnalogyGuy (945258): Functions lie like dogs

      YHBT
    • by Jerf ( 17166 ) on Tuesday February 07, 2006 @12:14PM (#14660206) Journal
      Functions lie like dogs. You can test the output of functions until you're blue in the face, but until you take a holistic view of the application and what it does, unit tests are more a salve for management's mind than a boon to developers.

      And the solution is... "holistic" unit tests.

      While it's true that unit tests have a hard time making that last little yard (mostly in the form of hardware output, like graphics on the screen or your example), you're not writing your unit tests correctly. It's a rare unit test for me that is the equivalent of checking that adding two numbers work correctly, and while those are useful in development, they very, very rarely ever break later. Pure arithmetic function are the easiest to write, in general, and they correspondingly have the smallest need for continuous automated testing. (Not zero, of course, just the smallest. And when they do break, boy howdy...!)

      In your other example, you ask:

      Can you verify that the input data was correctly injected into the database...

      (and I cut the rest of this question off as it posits an incorrect approach.)

      The answer to this is yes, although you need a good database and a good understanding of how they work. (Not "great", just good.) I have thousands of tests that verify that certain code correctly manipulates the database, and that verifies calling certain webpages correctly manipulates the database. It's only marginally harder than testing a traditional function. The key here is to do everything inside a transaction; perform the task, do your verification, then roll the entire transaction back. Then it doesn't affect your database (which should normally be the "test" database, of course), and as a side effect under all but the "READ-UNCOMMITTED" transaction level, allows you to have any number of copies of the same test(s) running against the exact same database.

      I can't imagine writing a distributed database-based application without such tests. Well, I can, but it's no fun.

      In a lot of database-based applications, since the database is the application, this goes a long way toward testing the entire app.

      Your unit tests ought to cover everything but the hardware output, which is more the exception than the rule.

      Part of the problem is the number of APIs that exist with no thought for testing, making it seem as if unit testing them is impossible. For example, a lot of GUI toolkits are a major pain in the ass because it's difficult or impossible to fully simulate pressing a key in them and then processing the event loop exactly once, after which you will see what happened. This is a limitation of the toolkit, though, not unit testing, one I fervently hope will someday be eliminated after my whining on Slashdot catches the eye of one of the GTK developers or something.

      In other cases, you have to a little work, but it can be done. We use Apache::ASP, and it ships with a little Perl script that can run an ASP page outside of the webserver via a command line. Still not terribly useful, but I was able to take that script and turn it into something that accepts multiple requests over a pipe, and wrap another Perl module around it that manages the connection to make it easy to use. Now, in my unit tests, calling a web page looks just like calling a function. Unfortunately, the rollback idea doesn't trivially work here, but I have some other things in place to help with this. The upshot is my unit tests include whether entire web pages work. This is some damned fine testing, and it's caught plenty of bugs long before they get out to the user.

      Sure, right on the periphery of some systems is hard to reach, but the vast majority of any system is perfectly managable.
      • Just out of interest, have you checked out Ruby On Rails' testing? It comes 'out of the box' with all the bits you need to create a test database and break it in all kinds of interesting ways. It automatically rolls the database back to a sane state ready for your next unit test, so you can test your transactions all you like. It also allows you to call webpages as functions and test them, and there's addons that will automatically validate your pages using w3c's validators. It does seem to answer a lot
        • Even if I could choose my language, and I choose Ruby instead of Python, I'm still not ready to commit to Ruby on Rails for the size of application I'm talking about. I wouldn't have chosen Apache::ASP, either, but at least I've got it harnessed.

          (Honestly, my problem hasn't been the frameworks, my problem has been people proving that you can write tightly-coupled spaghetti code in any environment if you don't watch them like a hawk. Ruby's no more the answer to that than what I've already got in place.)

    • The idea that you can input some values and expect useful output from a function is nice in theory. Perhaps in some very limited mathematics oriented programs where the inputs must lead to a nice answer, but real world applications may end up manipulating more than just the input data.

      You're right. And in such instances, if all you're doing is checking output, you don't really understand unit testing-- a key tenet of which is testing the code unit in isolation.

      Can you test that the LCD has refreshed at the

  • by iangoldby ( 552781 ) on Tuesday February 07, 2006 @05:06AM (#14658355) Homepage
    It's a pity the submitter didn't provide a short paragraph review of the article rather than just copy-paste the abstract.

    Anyway, having had a quick look, it is all about Java.

    I'd love to hear from anyone who can recommend test coverage tools for C (ie. non-object oriented). I think that just about all of the articles I've ever read about testing methodologies have been exclusively about object-oriented patterns, and pretty much only Java or .NET.

    Object-oriented techniques are a good tool, but not the right tool for every job...
  • by meringuoid ( 568297 ) on Tuesday February 07, 2006 @05:16AM (#14658382)
    ... they always cancel the stuff I want to watch to make way for it.

    Bloody cricket.

  • DO-178B (Score:5, Interesting)

    by nonsequitor ( 893813 ) on Tuesday February 07, 2006 @05:24AM (#14658404)
    Three types of code coverage are required for safety critical airline applications:

    1) Line Coverage - Has every line been tested
    2) Branch Coverage - Has every branch been tested
    3) Boolean Coverage - Is EVERY possibility on a truth table for each logical operator explicitly defined

    These tests alone don't certify that the code is ready for an airplane and that it is indeed "bug free." My software engineering professor said it best when he stated, you can only prove the existence of bugs, you cannot prove the non-existence of bugs. These guidelines as adopted by the FAA for the certification of safety critical code, don't prove the non-existence of bugs, but they do go a long way towards proving the existence many bugs and provide a MINIMUM standard to which code must be exercised before being allowed into an airplane.

    Software Engineering is a science, methodology has been pioneered to help us ENGINEER the software we develop to be as defect free as we know how to make it. As in other disciplines of engineering, there will always be things not yet quantified. Take architecture for example, an architect would design a bridge to withstand an earthquake of a specific magnitude, winds of a specific speed. Does that mean the bridge is safe? What if the materials used weren't rated for the temperature range needed for the locale, etc...

    As much as we do to ensure quality, there is no silver bullet. The company I interned at which will remain nameless made a multi-function navigational display for airforce one. It rebooted during a touch and go at 40 degrees farenheit. Wasn't it tested you ask? Of course it was, it was tested at -40 degrees and 140 degrees, but the timing on one of the buses was off at 40 and the hardware watchdog took it into a reboot at a very critical time. It was DO 178B Level A certified, had 100% code coverage of course, but there will always be bugs. Don't trust tools to tell you otherwise, because you can never prove the non-existence of bugs.

    (For those who don't know, a touch and go is where the plane starts landing and takes off again)
    • Re:DO-178B (Score:3, Interesting)

      There are also: Path coverage (extremley complex and not practical to use in most cases, but for critical sub-systems it might come in handy) Linear Code Sequence and Jump (LCSJ) There are more, but these two on top of my head is worthy of inclusion in any discussion on coverage. There are a lot of business-specific standards out there that specify use of coverage. Aerospace has one, vehicle control systems has one, pharmaceutical and nuclear system yet others. Guess wich one of these that has the _least
    • Re:DO-178B - MCDC (Score:3, Informative)

      by Anonymous Coward
      Note that DO-178B requires MCDC (Modified Condition Decision Coverage) for level A software (check DO178B page 74).
      MCDC requires that "every point of entry and exit in the program has been invoked at least once, every condition in a decision in the program has taken all possible outcomes at least once, every decision in the program has taken all possible outcomes at least once, and each condition in a decision has been shown to independently affect that decision's outcome. A condition is shown to independen
      • I was simplifying the process to make a point, they did that too. I actually felt sorry for the verification people who had to sign off on a screen capture that every pixel was correct. Thats the brute force testing for every possible combination of inputs, for over a dozen analog and digital inputs. Which results in thousands, if not tens of thousands screen shots to be hand verified as correct with those inputs. And that the failure conditions were all displayed accurately since old data can be much w
      • She was a fast machine
        She kept her processor clean
        She was the best damn computer I had ever seen
        She had Bugzilla eyes
        Telling me no lies
        Knockin' me out with those APIs
        Taking more than her share
        Had me fighting for air
        She told me to com(pil)e but I was already there

        'Cause the walls start shaking
        The game was Quaking
        My mind was aching
        And we were make-ing it and you -

        Test me all night long
        Yeah you test me all night long
    • Re:DO-178B (Score:4, Insightful)

      by msobkow ( 48369 ) on Tuesday February 07, 2006 @09:31AM (#14659140) Homepage Journal

      The code was designed, exercised, tested, and executed properly from what you're saying. The display failed due to hardware problems.

      In what way is that hardware failure related to code coverage or any other form of software testing or QA metric?

      • Re:DO-178B (Score:2, Interesting)

        by nonsequitor ( 893813 )
        You're right, that was technically a hardware bug, but they fixed it with a software patch to change the bus timings. Not the optimal solution of course, but a board respin costs much more than a software update. It may not have been the best example, but the point remains, you can't prove the non-existence of bugs. Anyone thats trying to tell you otherwise is either a genius or more likely an idiot.

        To answer your question, I was trying to illustrate that the metrics while a good starting point are merel
    • you can only prove the existence of bugs, you cannot prove the non-existence of bugs

      You sure that's what he said? Program proofs can indeed prove the correctness of programs (i.e. the non-existence of bugs). It's just that they're hard to do for any significant amount of code.

      The way I heard the quote, it's about testing: "Program testing can at best show the presence of errors but never their absence." (Edsger W. Dijkstra)

  • Yes (Score:3, Funny)

    by peterpi ( 585134 ) on Tuesday February 07, 2006 @06:35AM (#14658586)
    The test coverage lead me astry last summer. My boss had a TV near his desk, and most afternoons we'd find ourselves gathered around it following the action. Thankfully he was as much into the game as I was, so it didn't really matter.

    Bit of a strange subject for slashdot, eh?

  • by dresseduptoday ( 621090 ) on Tuesday February 07, 2006 @06:41AM (#14658602)
    The technique of unit testing is good, and catches many errors, and code coverage is a very good companion in finding out what you haven't tested. Unlike what some posters above have indicated, this is generic, and has nothing to do with the programming paradigm used, nor the programming language. There are two major problems, however. 1. With unit testing, you're only testing that the unit does what you expects it to, given its interfaces (the API, global variables, whatever...) If a bug is a misunderstanding of the specs, you won't catch it, unless the person who wrote the unit test is the one who wrote the specs. 2. You won't discover errors in situations you haven't tested for, and if the code is written poorly enough, it'll give you very good coverage numbers. Example, code that has no error handling what so ever, and a test suite that doesn't subject it to error situations. These problems doesn't make unit testing, and code coverage analysis bad. It's far better than not even trying. But you have to be aware of them and scrutinise the test suite to see what it *doesn't* test, especially if code coverage numbers are really high.
  • by ribuck ( 943217 ) on Tuesday February 07, 2006 @08:16AM (#14658859)
    Test coverage measurement is a really dilute quality assurance tool. It can show you parts of your code that are untested, but it doesn't say anything about whether the other parts of your code are tested.

    Just executing a line of code or a branch (whilst running a test) does not imply that you are testing that code.

    • Test coverage measurement is a really dilute quality assurance tool. It can show you parts of your code that are untested, but it doesn't say anything about whether the other parts of your code are tested.

      Which is fine if you understand that. My fear, if I should use such tools, is that they would produce semi-meaningful figures (say, a percentage) and Management would learn about it, and start measuring progress and performance based on them. Once that happened, these semi-meaningful figures would control

  • ..cuz I read "Test Coverage Leading You to Ashtray".

    Test coverage efforts are more likely to drive people to drink, IMO.

  • The testing tool supposedly saw this code:

    package com.vanward.coverage.example01;
    public class PathCoverage {

    public String pathExample(boolean condition){
    String value = null;
    if(condition){

    value = " " + condition + " ";
    }
    return value.trim();
    }
    }

    and the code was executed once with condition equal to TRUE. It then reported 100% coverage!

    How is that 100% coverage? If condition was FALSE then a completely different path through the instructions would have been executed!

    I

"Ninety percent of baseball is half mental." -- Yogi Berra

Working...