![]() | Monkeys at Keyboards: The Javanomicon © Michael James Heron | ||||
| Topic: Java Programming Level: 2 Version: delta | |||||
17 - Testing and Debugging | |||||
| Previous | Table of Contents | Next |
| Forum |
| Chapter Objectives |
By the end of this chapter, the reader will be able to:
|
Getting a program up and running is only one step in the development process. Actually getting a program to run correctly is a whole new world of effort, and it is often far more costly in terms of time, effort, and will to live. Solving syntax errors is the easy part of fixing a program - although the error messages provided by compilers are often somewhat obtuse to someone without much experience with a particular programming language, they have a structure and a context that becomes easier to interpret as time goes by. Debugging, on the other hand, is not something that you simply 'get better at'. Sure, you develop a wider range of experience and can extrapolate from that experience as to what may be wrong with a piece of malfunctioning code - but that's only part of the problem. The Software Engineering Institute estimated in 2002 that experienced programmers will inadvertently inject, on average, one defect for every ten lines of code. The task of testing is extremely complicated with most modern programs - so complicated that 'testing' is not an ad hoc process within most software houses - instead it is a separate stage of development. Alas, so few of us have access to a dedicated group of testers and we therefore must attempt to put our own programs through our own testing strategies to make sure our own code actually works. In this chapter we will discuss some testing strategies, and how they can be applied to ensure a degree of correctness in our code.
The purpose of testing is not to show that a program works. The purpose of testing is to develop test cases that show that a program doesn't work. A good test is one that uncovers some error in the program. As I mentioned above, most of us don't have access to a group of testers, and so we must write our own tests to ensure the correctness of our programs. This is bad - programmers are the absolute worst people in the world to test their own code. There is the problem of unconscious assumptions - we know what assumptions we have made when developing a particular piece of code and so tests often play to these assumptions. There are biases - we often know, even if it is only subconsciously, where our code is weak and our tests will avoid stressing such areas lest our delicate egos be bruised. We can get around these problems to some degree by adopting a rigorous and effective testing strategy - we apply a structured technique to our testing and so remove the personal aspects from the process. Easier said than done, of course! We don't test a program to ensure it works - we test a program to ensure that it gives the correct answers for particular sets of actions or data. We call these sets test cases. Each test case is designed to expose a flaw in a program. If it succeeds in doing so, it's a success. If we don't, then we try again with some other test case. In effect, what we are doing is testing to show that a program is not correct. However, correctness of a program is a difficult thing to define, and there are many standards and sets of criteria that may be adopted to define what constitutes a correct program. For this book, we will define a program as being correct if it meets the following requirements:
This is not a full list of the things we would expect from a correct application, but it will serve our narrow purposes. Readers interested in the more theoretical aspects of testing and debugging are directed towards one of the many books available on the subject.
The problem with testing is that it is all but impossible to be exhaustive with anything more than the simplest of applications. Consider the following trivial method:
Even though the code for this function is trivial, it would be immensely time consuming to test it exhaustively. We'd need to check every combination of positive and negative numbers to ensure we get the right value out every time. If we then extend the method to cope with three numbers, it becomes even more difficult - there is a combinatorial explosion effect that means additional variables in a program make it exponentially more difficult to exhaustively test it. We can never be sure that a program will definitely work - instead, we must aim for a reasonable level of certainty based on well-defined and targeted test cases. This is the major problem with testing - it's not exhaustive. However, we can make use of certain heuristics to define test cases that are representative of all possible categories of data, and base our tests on multiple instances of these cases... in this way, we develop a degree of certainty about the reliability of a particular piece of code.
There are a variety of formal testing strategies that we can make use of to make our job of testing somewhat easier. The application of formal strategies can ensure a degree of rigour in our testing that an informal approach can never hope to achieve. We will look at four testing strategies in this chapter - these four can be broken into pairs of complementary techniques that can be effectively applied to the testing of a given application or applet.. The strategies we apply to individual methods are:
The strategies we apply to complete classes and subsystems are:
All of these techniques come into play in the development of an effective testing routine. We'll look at each of them in turn (and see some examples of their use). In the next case study, we'll look at how we can apply all four techniques during the development of a multi-class application.
Black box testing is perhaps the simplest of the strategies. It requires absolutely no knowledge of a particular method - it doesn't even require any knowledge of programming. It does however require understanding of what a function is supposed to do. Black box testing is concerned solely with inputs and outputs - we don't care what's happening within a method, we just care that we get the right values out with regard to the values we put into it. With black box testing, we define a set of inputs to a function - one for each of the expected parameters. We determine what the expected output is, and then we compare that expected output to the actual output. For our simple addNumbers method, for example, we may define the following test cases:
We don't care how we are getting these answers - even if it's just a random number generator churning out the answers, if it's always right then we simply do not care. The function is a black box (hence the name) that we cannot see inside. An analogy to this is Searle's Chinese Room - consider someone sitting inside a room. In this room is a Chinese dictionary and a set of rules for manipulating the ideogramatic pictograms that make up written Chinese. The room has a letterbox, but no other way for the person inside the room to communicate with anyone outside the room. Someone outside the room passes in little pieces of paper with Chinese characters written on them, and the person inside the room translates them via the dictionary and the predefined rules, and then passes a translation back through the letterbox. The fact that the person outside the room is getting the correct translation out is the key issue - we don't care that the person inside the room
White Box Testing works on the opposite principle - we care deeply about what is going on inside the method, but not very much as to what is coming out as an answer. White Box Testing requires a structural understanding of a method in order to design the test cases. Test cases in white box testing are designed to ensure every statement in a function can be executed. We want to ensure that the flow of execution through a program is functioning as was intended - wherever we have a conditional structure such as an if statement, or a while loop, we must ensure that we have a set of input that will execute the code that belongs to that conditional. Consider the following method:
This method has a fairly complex control structure, including nested conditionals. We must write test cases that hit each of these lines of code:
How do we know what test cases are going to hit which paths of execution? For a simple method, we can usually tell at a glance. For a more complex method we need to sit down and work it out with a flow chart. Flow charts are a relatively simple diagramatic notation for stepping through the processes of a piece of code. We use a rounded rectangle to indicate the beginning of the chart, and a diamond to indicate a decision. Rectangles are used to indicate statements to be performed. Usually they are written in high level pseudo-code, but sometimes actually using the Java code statements can make them easier to follow.
Flowcharts may seem a little antiquated in the day and age of UML and other complex development aids - however, for simple clarity of expression and 'comprehension at a glance', very few things offer the compactness of flow-charts. They are conceptually simple, which means that even non-technical people can be easily tutored in their use very quickly. It's as close to a universal design language as you're likely to find. The flow-chart will show the branching factor of any method, and at each of the branches we can provide test-cases that hit or miss the conditional - all of the paths in a program are going to be based on a conditional in one form or other - even a for-loop is based on a simple conditional in the evaluation clause of the loop structure.
The flow chart here is quite simple, but it shows us at a glance all of the paths of execution through the method. A more complicated method will obviously generate a more complicated flow chart. For example, imagine the flow chart of a method that applies the Sieve of Eratosthenes to an array of numbers and then displays them. Even this is a comparatively simple method, but the number of paths of execution are surprisingly large:
This program gives rise to the following flowchart:
The diagram offers us a simple way to test each conditional - we can see where the program branches, and we can see what criteria is used to decide which path of execution to follow. We can work our way through this chart to determine which values must be used to ensure a proper white-box testing strategy.
These techniques are not used independently - they are used as complements to ensure effective testing of a particular method. They use different strategies and are used to test for different things. White box testing tests for logic errors that creep in when coding for special cases - every statement must be tested to ensure proper coverage. The assumptions a programmer makes about an execution path can be incorrect, and white box testing can demonstrate this. The applicability of white box testing is based on the fact that errors are random and just as likely to appear on an obscure path down a twisty maze of conditionals all alike as they are to appear on a mainstream path. Black box testing allows us to determine how closely a method conforms to the required functionality. White box testing shows that logic paths are followed correctly, but it does not show that the logic is actually correct - black box testing demonstrates this by ensuring a tight conformance of expected output to actual output. Both methods suffer from the 'impossible to exhaustively test' problem, so we must design test cases that compensate for this weakness by ensuring that they are representative. Simply choosing a million test cases completely randomly does not ensure program correctness - certain kinds of input are more likely to yield errors than others.
As has been stated a number of times, we can never be exhaustive with our test cases - the best we can hope is that we are representative. In this section we'll discuss ways to ensure this. Three useful heuristics for developing test cases are looking for boundary conditions, testing against maximum and minimum values and grouping sets of data into equivalence classes. Boundary conditions are areas where errors are very likely to crop up. This is expressed very commonly in the off by one error, where an incorrect choice of equivalence operator has been chosen as the conditional for a structure. For example, if we're checking a percentile value (one that can between zero and 100 inclusive):
Every now and again, this will fail to produce a message. This is due to the off by one error - instead of checking the value against being greater than 0, it should check against the value being greater than or equal (and the same is the case when checking against the value being less than 100, and the value being less than thirty):
These kinds of mistakes are spectacularly easy to make, so this is a fertile area to direct test cases. They don't even have to occur at the extremes - they can occur with equal frequency at any boundary between one block of code and another. Test cases for both black and white box testing should be directed at these boundaries. We should also test for the first few values around the boundary condition too... so if the boundary condition is:
Then we should test for i being equal to 30, and for i being equal to 29, and 31, and 28, and 31. We cluster tests around the boundaries to ensure that the transition point is correct. We can group sets of similar inputs into equivalence classes to simplify our job of testing. We group together sets of data within boundaries and say that if it works for X number of these sets of data, it will work for all of them. For example, if we have a simple method:
For this method, there is only one boundary - the one between positive numbers and negative numbers. So rather than test thousands of numbers of both types, we select a number of representatives of input within each of these boundaries, and say 'if it works for all of these, it will work for all of the rest':
There is very little point in checking all of the numbers, since there are no boundary conditions along the way to complicate what is a very simple check on input. All we are looking for is a degree of confidence in the program correctness, and choosing a suitable number of examples from each equivalence class can provide this confidence. After all, if you can't exhaustively test it anyway, are you really going to be happier knowing that 101 works as well as 100 and 99 and 98 and...? Finally, we need to check maximum and minimum values. Often these are values that are imposed by a particular programming language, but they can also be programmer defined limits. If we want to be able to set a percentage value, we need to check to see what happens at the maximum and minimum points to make sure that it works correctly. A good testing strategy is one that is made up of a number of test cases from each boundary, each equivalence class, and checks the maximum and minimum values - however, there are other things we need to check, particularly in applications or applets where the user is permitted a free reign in data input. We need to make sure that our code can also deal with invalid data types (such as the user entering a string of text when we need a number). As you can see, designing proper test strategies can be very time consuming, but they are necessary to ensure that we are indeed producing software that meets our requirements.
Ideally, methods are tested in isolation - this means we don't care about how they interact with other methods (that comes later), we just care how they react in terms of their inputs and outputs. To test this, we use what is called a test harness, which is really just a stub of code that inserts a set of input into a method and outputs the result. At its simplest, it can look like this:
However, it is usually necessary to input a large number of test cases, and so a testing harness is often more complex to allow for automating input and output:
Depending on the level of sophistication required in a test strategy, the harness may involve multiple classes and complex data structures (ironically requiring a degree of testing in itself to ensure correctness!). Usually test data is read in from a structured file on disk (we haven't discussed how to do this, but we will in a later chapter), and the output also written to a file for analysis. The actual details of how input and output is done is a matter of preference and of each individual testing group's policy.
So, we've covered how we can test our individual methods to make sure they're working, but that's a small thing - most of our code is made up of something far more complicated - a chain of methods working together to form a class. We need to test each of the methods in turn to ensure they work correctly. This is called unit testing, as we're testing something as a unit in isolation of its context. It's also something we apply to classes and objects. In the context of methods, if an object passes all our carefully designed white box and black box tests, then it is deemed to be correct and suitable for use. However, it's not just internally that we need to test something, we need to test things to ensure that when methods are used together that they pass the right information to each other. Most modern software projects involve many methods being called, and if we enter a value into method A which calls B which calls C which calls D which outputs the answer and it turns out to be wrong - well, where was the error? Unit testing helps us ensure that a particular unit is correct, but we need integration testing to ensure that the units work together and pass the correct information to each other and return the right data when necessary. Integration Testing Within a ClassFirst, we choose a method that makes use of no other methods of the class - a baseline method that doesn't require anything else to be working before it can be tested and marked as being correct. We test this with black and white box testing, and then when we're sure it's correct we look for another method within the class that makes use of our baseline method. Once we've found one, we link the two together in the way they should work and test them together in a harness. If they behave correctly for all our test cases, then we look for another method that makes use of our two and link that in and test the three together. In this way, we build up a relationship between methods and know that if there is an error it can be isolated to the last method we added to the chain. Eventually we link together all of our methods and can see that they all work together properly, and that our set of methods is now a single class - a unit. Integration Testing Within A Class HierarchyThe exact same technique is used to link classes together. Once they have been integration tested internally, we link them together and make sure that they interact properly. Here we make use of the encapsulation techniques we discussed before - we need to test every possible interaction between two classes or objects, so restricting access only to the methods we want to be made available allows us to make testing as painless and efficient as possible. This strategy makes use of the modularity of Object Orientation to ensure that we can incrementally test a complex application by slowly building up the relationships between individual objects and isolating errors as being caused by the last object we added into our testing strategy. Coupling And Integration TestingRemember when we talked about how good software developers aim for low coupling? Well, this is the reason... the more connections there are between units, the more difficult it is to ensure that the communication is working properly in all cases. Consider two coupling diagrams:
Which would be easier to test? If you answered 'Diagram B', then you'd be 100% correct. The level of coupling between objects (and methods) has a very noticable impact on the difficulty of testing a program
In the next chapter we're going to look at a solid example of all of these testing strategies, but let's just consider an abstract example of how the process is followed. Imagine the program represented by the following UML diagram:
How would we go about testing such a program? To begin with, we pick one of the base classes and apply unit testing. What constitutes a base class? Well, it's a class that has no dependance on any other class. In this program, a Library has a dependance on both the Book and Customer classes. The Customer has a dependance on the Book class. The Book class has no dependancies. We begin our testing with this class. Similarly, we begin with a method that has no dependancies on other methods. Luckily all we have in our Book class are accessor methods and one utility method (isDueBack). We apply black box testing to each of these methods... usually an accessor method has only one path of execution, but in unusual cases it may have more. Where appropriate, we also apply white box testing. The isDueBack method returns true if a book is due back and false if it doesn't. It has a dependancy on the getDate method, and so we must ensure that getDate() works before we test our isDueBack method. Before we can begin testing this method, we need to make sure the communication between the two methods is setup properly... this is integration testing. We test the communication between the getDate method and the isDueBack method to ensure that the right information is being received. Once we are sure of that, we apply both white and black box testing to our isDueBack method. Once all of the methods in the class have been unit tested, and once we've integration tested our getDate and isDueBack methods, we proclaim the class to be unit tested. Everything in the class works. Now that we know our Book class works, we can test our Customer class. Since it has a dependancy on the Book class, we make sure the communication between the two classes is setup... our integration testing is now at the class level rather than the method level. Once we are sure the communication works, then we unit test each method in the class. Where methods have a dependancy on each other, we test the communication between them. Once we have both the Book and Customer classes tested, we can begin testing the Library class. Once again, we test each of the communications and then unit test each of the methods. Once we are sure that the classes are unit tested, we test the full integration of the three classes. First we test Book with Customer... we make sure that the two work properly together. Then we test Library with Customer, and then Library with Book. Once we're sure that both of these work individually, we combine them all into the finished application. It's a long, complex procedure but it makes the task of finding malfunctioning code much simpler in the long-run.
Okay, so we've discussed how to work out when something is wrong, and even where it goes wrong. However, it's a much more difficult thing to work out why. Debugging is as much an art as a science, but there are certain techniques we can make use of to locate problems and solve them. Some IDEs provide powerful support for debugging, such as step over debugging which allows you to execute individual lines of code and see the state of any variables at that point of the execution. While these tools are very powerful, and their value is considerable to experienced developers, at this stage of our learning curve as programmers they only encourage laziness of thought. Debugging requires a thoughtful contemplation of a program and how its various components interact. You're a detective, trying to find the criminal in your code that is causing the breakdown of your carefully ordered society. Arthur Conan Doyle, through his creation Sherlock Holmes, once said 'Once you have eliminated all the possibilities, whatever remains, however improbable, must be the answer'. This is a valuable maxim to adhere to when debugging. Different programming languages raise different issues when finding errors - bugs that appear in one language may not appear in another. It's all to do with the way the language works. For example, in Java, the following code will work perfectly:
Run this application, and our screen will be covered in whee. Disgusting, yes... but still useful. However, in ASP, the equivalent code would cause the statement to be printed out once, and only once... this is because both for loops use a variable called i as a counter. Java enforces variable scope, so the variable i in doSomethingNeat is different from the one in the method example. In ASP, scope is not enforced in this way, and so the counter in doSomethingNeat would overwrite the counter in example. This is an obscure problem that may or may not occur to someone with experience in another language, but illustrates the point that you can't make assumptions - you must test all possibilities and assume anything can be causing your error. So that's a lot of stuff we need to consider if we're going to solve the problem. We need to take a structured approach to actually finding the offending lines of code. There are some handy guidelines we can use to help us:
Step one: Reproduce your errorsBeing able to repeatedly reproduce an error is the most vital part of debugging. Unarguably the most frustrating of all errors is the one that you can't find out how to reproduce. If you can't reproduce the error, all of the debugging in the world isn't going to help you fix it because you simply cannot find out what is causing it. You can make 'best guesses' and add some general 'band-aid' solution that stops the error occurring, but at best that is an ugly hack. If your program causes an exception that you cannot reliably reproduce, you can fix it by adding a try and catch around the offending code, but you haven't fixed the error, just the symptom. Spend time working out what causes the error - an effective test strategy is vital for this, as it allows you to work through your previous inputs and see which elements of your test case are required to reproduce the problem. Step two: Use debugging messages. A lot.Without a formal debugging tool, we need to make use of debugging messages to see the state of variables as they go through the system. If we're running a GUI application, then we can output information using System.out.println to see the contents of variables to make sure that they have the values we expect. We can use them within conditionals to ensure they're being executed when we think they are (although white box testing will also show this). We can use them after we have applied some mathematical calculation to ensure that we have the right values:
Often, this is enough to solve the problem - you find the piece of the code that isn't behaving correctly then you sit back work out why. Consider the fragment above in a full method:
Running the program gives us the output 'y is 4.0' - but that's not right! It should be 4.166664. How can that be? Y is a double, so it should be able to hold a decimal number, but the calculation is wrong. A little reading up will show that the calculation we are doing returns an integer because X itself is an integer and so is 24, so we need to make one of them into a double:
Compile and execute, and we get the right answer. Step three: Simplify, SimplifyMuch of the code in a particular method is window dressing, as far as solving an error goes. It's exception handling, and consistency checking, and other such miscellaneous house-keeping. It's all absolutely vital, but not when we're trying to locate a bug. In fact, all it does is complicate our search. Simplifying our method can make the search much easier - comment out any of the code that doesn't relate to the error so your search is restricted to only relevant code. Then, as you eliminate particular lines as being innocent, you can comment them out and replace them with place-holder values to further reduce complexity. This is particularly useful in the case of methods:
Step Four: Try thingsOnce you've got your code down to the statements that are important, try changing things - if you have variables being set, set them with different values to see what happens. If you have counter loops, try changing the condition. Make a careful note of what effect each change has, and analyse those changes with reference to the flow of logic in your program to see how each change is affecting the execution. Step Five: Understand!This is the important one - once you've found a bug, understand what is causing it. Don't give in to the temptation of simply fixing it or recasting the code so that it doesn't happen without learning from it. Partly this is simply good practise - you learn more from your mistakes than you ever do from your successes, but only if you take the effort to ensure that you understand why your errors occurred. However, it's also very important from a pragmatic point of view. If you have a loop that is being iterated over one more time than it should, and you can't see why, then you can fix it by changing the continuation part of the loop so that it simply iterates one less time: Before
After
All fine and dandy, and you've fixed your immediate problem - but fixes like this have a tendency of fixing only a symptom and leaving a larger problem unaddressed. The value of someVariable wasn't what you were expecting - if you are making a reference to that variable elsewhere, or basing a calculation on it then you're going to have problems there as well.
It's not fun, it's not entertaining, and it's not very fulfilling for a programmer - but testing is absolutely vital and simply one of those things that has to be done, there's no getting around it. Applying formal techniques to testing can eliminate much of the bias that is inherent when programmers test their own code, and provide a degree of rigour to the process that ensures a level of confidence in the correctness of a particular application. We can never exhaustively test a program, but often we don't have to - if we have enough good test cases then this is enough for most applications to ensure that that representative sample of input functions correctly. There are other techniques for building error free applications - these are particularly used in the case of mission critical or safety critical applications. Formal methods like Z work by using mathematical techniques to build application code, and in the process proving that they are free of errors. Such complexity is far beyond the scope of this book, but there are numerous books on the subject available for those who wish to delve further into the subject.
Exercise oneDevise an appropriate testing strategy for the following methods, and then mark them as a pass or a fail.
Further ReadingThe following table details further reading on the topic in this chapter, and also any external resources that you may find useful.
|
| Previous | Table of Contents | Next |
© 2004-2006 Michael James Heron