![]() | Monkeys at Keyboards: The Javanomicon © Michael James Heron | ||||
| Topic: Java Programming Level: 2 Version: delta | |||||
31 - Efficiency and Optimisation | |||||
| Previous | Table of Contents | Next |
| Forum |
| Chapter Objectives |
By the end of this chapter, the reader will be able to:
|
There is a famous rule in computing - the 90/10 rule. It is applied to a number of different subjects, but in the context of this chapter it can be interpreted as '90% of the processing time for an application is spent in 10% of the code'. The exact numbers of the rule change from person to person. Some quote 80/20, some quote 70/30... but the general gist of the rule is the same regardless of the actual numbers. In terms of the 90/10 rule, the converse is perhaps more enlightening - 10% of the processing time for an application is spent in 90% of the code. This means there is a very small section of your application which is a genuine candidate for significant optimisation - even if you double the efficiency of 90% of your code, you will only achieve an overall improvement of about a third. Fine-tuning the processing-intensive code is the key to significant performance gains. In this chapter we'll look at efficiency in general, and some techniques you can apply to optimise the code you write. However, before you leap into your existing programs and haul them to pieces in the name of efficiency, it is a good idea to recall Michael Jackson's (developer of the Jackson System Diagram method, rather than the singer) golden rules of optimisation:
And Donald Knuth has this to say:
With those dire warnings in mind, let's get on with the meat of the chapter!
To start with, we should really define what we mean by optimisation. It is misleading to think of optimisation purely in terms of how fast your code runs. There are three main categories of optimisation, and achieving significant gains in any one of them involves making sacrifices in the others:
These break down into the following categories:
At best, any given piece of code can optimise two of these at the detriment of the third. For example, an application that is both small and fast is likely to be difficult to maintain. An application that is very fast and easy to maintain is likely to be very big, and an application that is small and maintainable is unlikely to be fast. The best balance of these competing demands will vary from application to application. A 3D engine is likely to require everything it can squeeze from the CPU in terms of speed - size and maintainability are going to be secondary concerns (if indeed they are concerns at all). A banking system must be maintainable to ensure that changes in regulations can easily be implemented. If this is a batch system that runs overnight on a powerful mainframe, then the size and speed are not going to be important considerations. The triangle above shows this relationship - every program will fall somewhere in its area, but the further it moves towards any given point, the further it moves away from the others. Deciding exactly what is important to you in your program is the first step you have to take when optimising your code.
The 90/10 rule is vitally important when optimising for speed - you don't want to spend hours optimising a method that gets called once every ten minutes when you have another method that is being called sixty times a second. Your time is much better spent optimising the latter method than the former. There are a number of tools (called profilers) that can help you work out which parts of your application are doing most of the work - most of the profilers that are genuinely effective are expensive or hard to use (or indeed, both), and so we won't talk about them here. Instead, we'll look at what we can work out by the sweat of our own brow. The JDK for Java provides some simple profiling tools - you'll need to get into the command line and use the following statement to get the profile information:
The profile information provided is somewhat difficult to read as it is written, and so a third party profile viewer tool is useful. We can however work out for ourselves where our code is spending most of its processing cycles with a few System.out.println calls - we can provide ourselves with a full breakdown of what methods are being called and where. Before you start meddling about with code to try and get it to work faster, it is a good idea to first get a feel for just how efficient your code is. Way back in chapter three we looked at a testing harness for computing the efficiency of code. While this was fine as an introduction, it leaves much to be desired as a general tool since the performance wasn't constant. If we want a representative sample of efficiency, we must work with a larger number of performance runs. The process of determining the speed of a piece of code is known as benchmarking. To obtain a useful benchmark, we must run the code we are interested in a number of times and then average out the results. For example, let's look at declaring and initialising a string variable:
We'll do this 1,000,000 times, which should give an indication of how many milliseconds this particular line of code takes:
The output for this is:
Interesting! How about if we change this a little:
This gives us a rough indication of performance, but it's not ideal yet... for one thing, the garbage collection routine can run during the process which will skew the results. We can force the garbage collection routine to run before we begin the timing to minimise the chance of this:
The problem with any technique like this is that the results are indicative for one system only - different compilers, different versions of the virtual machine, different processors and different operating systems will all give different results. However, the results it gives you for a particular piece of code is usually enough for you to have an appreciation as to whether or not your changes are improving the speed of your program. There are certain techniques that are always useful when attempting to improve the efficiency of your code. Remember that using these may have a detrimental effect on the size and maintainability of your programs. Strength ReductionStrength reduction occurs when an operator is replaced with one that performs the same task, but executes faster. We have never talked about the shift operator in this book, but it is used to shift binary digits up or down a number of places. For example, if we have the integer value 22, this can be represented by the binary code 10110:
We can use the shift operator to multiply and divide values by powers of two. For example, to double an integer, we shift the digits one position to the left:
Now the value of i is 32 + 8 + 4 = 44. Let's hook this up into our magic Benchmark-o-meter. We'll check the performance of finding the power of a number (2) to various values... we'll use a value of 100000000 for the number of iterations (since it's such a trivial calculation):
That's not much of an improvement - but it's there! Let's see for some larger powers:
As you can see, the performance for the multiplication operation degrades, whereas the shift performance remains constant... even when shifting 100 places. For a shift of one position, the performance gain isn't stunning - but the more we have to do, and the larger powers we need to compute, the better the gain. You might think to yourself 'Well, I would never use that multiplication code anyway - I'd just use Math.pow'. Obviously there would be nothing to stop you doing this, but the performance of Math.pow is much slower than even the multiplication. Putting the equivalent Math.pow statement into the benchmarking system (computing two to the power of four) takes a stunning 68 seconds (seconds, not milliseconds). Making use of this techniques requires an understanding of the underlying architecture, and the strength reduction techniques that work for one platform may not work for another. Common Sub-Expression EliminationThis is good practise regardless of whether we are attempting to optimise any given piece of code. This technique relates to ensuring that calculations that are reused in several places are calculated only once. For example, if we are computing the speed of an object from its distance and time, and then using that answer in two calculations:
We can eliminate common sub expressions by computing the speed first and storing it in a variable prior to calculation:
Once again, let's plug these two approaches into the benchmarking system to see the effect at 100000000 iterations:
Ensuring a minimum of redundancy is something you should be striving towards for reasons beyond simple optimisation - for one thing, it makes code easier to maintain and expand as well as faster to execute - one of the few occasions where a speed increase directly leads to improved maintainability! Code MotionCode motion is just a fancy of way of saying that we move code that is invariant to a location where it is not executed repeatedly. Instead, we place the code only where it is likely to change. For example:
Rather, we should write this as follows:
Again, this is good practise that you should aim for even if you are not particularly optimising for speed in a particular application. Reuse ObjectsOne of the most costly operations in Java is to create an object - it is usually much quicker to reset the state of an existing object and then reuse it. For example, rather than creating a new ArrayList when you're done with an existing one, use the clear method to empty it and then use this same object for a new purpose rather than create a new instance of the class. Be careful with this however because if you have a reference to the ArrayList somewhere else, the fact that it is a reference data type may cause inconsistency in the rest of your code. Consider for example two ways of setting up a pair of ArrayLists:
Compared To:
Our efficiency calculations are:
Cache Common OperationsAt the cost of a little extra memory, you can make a significant improvement in efficiency by making use of a cache to store the results of common calculations. Any time you go to execute the code that belongs to the calculation, you first check your cache to see the answer to the calculation has already been stored. If it has been, then you return the answer from the cache rather than execute the calculation. For example, our Fibonacci code in our chapter on recursion could greatly benefit from caching. Once we work out the value of a particular Fibonacci number, we then cache it in an appropriate data structure (such as an array) and then rather than compute it recursively we just return the value from the cache when it is queried:
Note that this doesn't make recursion any better a solution for this problem (it will still fall over and die from lack of memory when computing very large terms), but it dramatically improves the performance to the point where it is not longer laughable and is instead actually perfectly reasonable. Storing the results of common, costly operations is a superb way of improving the performance of your applications, but it is largely dependant on the results being deterministic - if the answers are going to change or be based on some other external factor, caching may not be appropriate. Know Your LanguageThere are certain operations that are very costly in one language that may not be nearly as costly in others. For example, in Java, the very common String operations we have been performing all the way through the book are very processor intensive due to the fact that strings are immutable. When we make use of the + operator on a string, what is actually happening is that a new string object is being created that contains that concatenated contents of the two strings we are adding together. This is very costly, and code that involves lots of such appending is going to be very costly to execute. Understanding the underlying language is the key to solving these kinds of problems. For example, Java provides a StringBuffer class that can be used to eliminate much of the costly excesses of the standard String class. Likewise, exceptions in Java are very costly, as is putting a synchronized lock around a method. If speed is of the essence, it is better to avoid exceptions in favour of inline error checking. There isn't much you can do about the synchronized problem except to ensure that your objects make use of it as little as possible by redesigning methods around predefined synchronized interfaces. Certain languages have benefits that make them suitable for particular kinds of optimisation - in some situations, it may simply be that the best solution is to write a piece of code in another language entirely to make use of its unique benefits. Use The Right Tool For The JobThis suggestion requires a considerable understanding about the efficiency of general algorithms and data structures. Each has its advantages and disadvantages, and even if the worst case scenario for a particular algorithm means it has an overall low performance, in the best case it may indeed be faster than a supposedly more efficient algorithm. Make use of mathematics to ensure that costly operations are ignored in favour of more efficient strategies. For example, computing the square root of a number is usually a very costly operation, since they have to be determined through the use of successive approximations:
Multiplying out the square root (rather than calculating it) can save many processor cycles, because computing the square of a number (rather than a root) requires much less processing:
The techniques for optimising for maintainability are similar to general practises of good programming, particularly in a language like Java. The whole field of object orientation is largely based on the ideas of reusability and maintenance, but there are still some pieces of general advice that are useful regardless. Coding StandardsPerhaps the most important tip is to make use of a coding standard and stick to it... it doesn't really matter what the standard is, although obviously some are better than others. Very strict standards such as Hungarian Notation are useful for certain kinds of working environments, but have a very large human-effort overhead in terms of keeping the information up to data. Hungarian notation is based on providing all variable names with a prefix that indicates what kind of data they contain - a hugely complex chain of prefixes can be built up that provides an experienced user with a specification of the data at a glance. Except that it doesn't really, because the overhead in changing a type of data and all of the variable references that go with it mean that as time passes less developers go to the effort of changing the prefixes, with the result that the notation become more and more misleading, and thus actually hinders maintainability. The important thing about any standard is that you are consistent with it - mixing and matching standards is guaranteed to cause problems later on down the line. Write Code For a Human, not a CompilerEssentially, every clever trick you make use of to take advantage of the idiosyncrasies of a particular language will slash the number of people who can actually understand what your code is supposed to do. Your code should make sense to someone who comes along and reads it later, rather than something that is clever but largely obscure to the majority of developers. For example, consider a developer who decided to limit the number of boolean variables in a program by using the individual binary bits of an integer variable as boolean flags... so if the bit is 1 it is true and if it's 0 it's false:
Yikes! While this indeed reduces the memory overhead of having a pile of different boolean variables, it's really not something that most people should be expected to have to understand in order to fix a broken program or add extra functionality. Simplify, SimplifyAlways look for the simplest solution to a particular coding problem, especially avoiding lengthy and complex conditionals. In order to know what you can do, you need to have some understanding of mathematics - it's quite surprising what can be accomplished with simple mathematical techniques that would otherwise require complex code. For example, let's consider a piece of code that keeps track of a user's file permissions, which can be any or all of the following: read, write, execute, delete. One way to deal with this is to use what is known as a Golomb Ruler, which is a series of integers in which all of the possible combinations of unique terms sum up to different numbers. The binary system itself is an example of this... any number you can create with binary numbers can only be created by one particular sum of the terms. If we interpret read access to the value 1, write access to be the value 2, execute to be the value 4 and delete to be the value 8, then we can add together all of a users permissions and then use a simple switch to determine what permissions a user has:
And so on. This is much clearer and easier to modify than making use of complex conditionals:
Document Your AssumptionsOne of the big problems of maintenance comes when you make an assumption about the way a particular program works and never bother to document it. You may assume that a particular method can rely on the state of a class wide variable to be a particular value, but if the code later changes it means that it can be immensely difficult for other developers to track down the offending code. Make sure you document any assumptions you make in the comments for your program. Indicate why the assumption was made, and what will need to be changed it the assumptions later turn out to be incorrect because of shifting requirements.
This usually isn't an issue, but it can be if you are developing applets for internet deployment. As with HTML files, the smaller the class files are, the better. There are some general techniques that can make your code smaller and thus quicker to download. Use Smaller NamesIt sounds ludicrously simple, but it genuinely does reduce the space taken up by a class file. The names of each entity are stored in full in the class file, and so choosing shorter names will have a noticeable effect on size. Of course, this has a hugely detrimental effect on maintainability and readability:
Versus:
The latter will occupy less space in the class file, but at a huge cost in terms of readability. Use Standard ClassesWhere possible, don't write your own classes - instead, extend and specialise existing Java classes. This greatly reduces the amount of original code you have to provide, and correspondingly it reduces the amount of code people need to download. Object orientation as a general strategy comes to the rescue here also - if you have a properly designed class model, then you are already greatly reducing the amount of original code and thus making your code smaller and quicker to download. Use ArchivesWe'll talk about this in the next chapter, but Java provides a compression format that allows for multiple class files to be downloaded in a compressed format. This greatly improves the download speed of an applet, or even a normal Java application. Remove Debugging InformationIt's very easy to think 'Ah, this debugging information is quite useful - I'll just leave it in for future use. Alas, this increases the size of your program without adding anything to the functionality. If size is an issue, you have to remove it before deployment.
As a rule, it is a very bad idea to code with optimisation in mind unless you have a very firm idea of where the emphasis should lie and where any potential trouble areas are likely to appear. It is a much better idea to develop a piece of software according to how you feel it should be developed and then tweak for optimisation when it is completed and working and a genuine need for a specific kind of optimisation has been recognised. Algorithms can always be tightened up, new classes can always be implemented to reduce replication, and variable and method names can always be shortened once a particular requirement has been identified. If you don't notice a particular need for optimisation, then don't do it - maintaining a balance between the three points of the Golden Triangle is a desirable outcome of software development and one that often outweighs any push towards one particular aspect of optimisation for the sake of it. As a rule, it is a very bad idea to code with optimisation in mind unless you have a very firm idea of where the emphasis should lie and where any potential trouble areas are likely to appear. It is a much better idea to develop a piece of software according to how you feel it should be developed and then tweak for optimisation when it is completed and working and a genuine need for a specific kind of optimisation has been recognised. Algorithms can always be tightened up, new classes can always be implemented to reduce replication, and variable and method names can always be shortened once a particular requirement has been identified. If you don't notice a particular need for optimisation, then don't do it - maintaining a balance between the three points of the Golden Triangle is a desirable outcome of software development and one that often outweighs any push towards one particular aspect of optimisation for the sake of it.
It's practically impossible to provide a totally accurate measure of performance. For one thing, processor speeds vary. Memory has an impact on performance, as does file IO speed and a multitude of other factors. The best we can do is provide a measure of relative performance, and even then numerical values don't give a full picture. One of the standard notations for indicating the speed of a particular piece of code is called big O notation, and it works by analysing the number of operations that an algorithm must perform on a given size of a problem set. You'll see it quite a lot when people are discussing the relative merits of, for example, bubble-sorts versus quick-sorts. There are three ways of expressing performance - best case, worse case, and average case. Big O notation will always give the worst case performance of an algorithm. In effect, the notation says 'This is the worst performance you can expect from this particular piece of code'. The O in Big O notation stands for 'Order', which is the time efficiency. We usually express the order as a function of n, where n is the number of operations that must be performed. Big O doesn't offer much in the way of granularity, but it does give a useful way of deciding which algorithm is the best to choose under particular circumstances. Usually Big O is broken down into a number of 'families' of performance. The table below lists them in order of fastest to slowest. They are not accurate measures - merely rough metrics:
Optimisation is a thorny subject, one filled with twisty paths and pitfalls. The standard advice to novice developers is simply 'don't bother with it'. Modern computers are so fast with so much memory that optimisation for anything other than very complex or very processor intensive programs usually doesn't demonstrate enough in the way of 'real world benefits' to make it worth your while. The rule for experts is to wait until a requirement for optimisation presents itself, and then and only then modify the code to meet the reality of the situation. Any significant push towards size, speed or maintainability will of a necessity result in a significant loss of the other aspects of good program design. While some techniques are complementary, in most cases it is impossible to have your cake and eat it too. The exact techniques for optimising any given program are application dependant, and may or may not involve any or all (or none) of the above tips. As in most things in programming, practise will make optimisation a simpler task. Further ReadingThe following table details further reading on the topic in this chapter, and also any external resources that you may find useful.
|
| Previous | Table of Contents | Next |
© 2004-2006 Michael James Heron