![]() | Monkeys at Keyboards: Java-Fu © Michael James Heron | ||||
| Topic: Java Programming Level: 3 Version: beta | |||||
6 - Case Study 2 - A Polymorphic Genetic Algorithm | |||||
| Previous | Table of Contents | Next |
| Forum |
| Chapter Objectives |
By the end of this chapter, the reader will be able to: |
The ability to successfully apply polymorphic principles is one of the most valuable skills an object-oriented programmer can possess. Polymorphism drives adaptability and allows for complex functionality to be shared across even substantially different programming structures. It is also, fittingly, a very difficult skill to develop... it is really only by looking at many different examples of where polymorphism has been applied that we can begin to see the patterns that emerge. Polymorphism is used all the way through the Java libraries - there are no shortages of areas that we could investigate for examples:
But more than this, we must look at otherwise exotic examples and see how polymorphism can be applied to provide a solid structure for expandable development. The best routines are applicable to as many different circumstances as possible... in this chapter we're going to look at a fascinating tool of searching. The genetic algorithm is a powerful search technique often applied to very complex problems... the big benefit of a genetic algorithm (GA) is that we don't actually need to know what we're looking for... we just need to know when we've found it.
With a GA, we pose a problem... what that problem is doesn't matter. One common example is to maximise the results of a mathematical formula. We start off with a pool of solutions. Each of these solutions is a potential answer to the problem. To take a simplified example, if we had a GA that was to maximise the following formula:
Each solution would be a combination of values for A, B, C, and D. Usually we provide some limits to what these values are... for example, 'A can be any number between one and one hundred. B will be between -100 and -200. C is a real number between zero and one inclusive, and D is a number between zero and one thousand). A solution to this problem is a set of these four numbers... for example:
These values then get plugged into the formula:
The results are then computed, and the result of this is classified as the fitness of a solution. In this case, the solution's fitness is 272. Traditionally, a fitness will be a positive whole number... in cases where a particular solution calculates at a negative number, it will be given a fitness of 1... we'll see why later in this chapter. A genetic algorithm begins with an initial set of randomly generated solutions... usually anything between 100 and 1000 solutions make a suitable starting set. Note that these are randomly generated - we have absolutely no way of knowing in advance how good any of these solutions are. The GA is an iterative process - each step of the iteration is called a generation. The driving force behind a GA is to 'evolve' an answer to the problem. During each generation, the GA calculates how good each solution is, and then it picks a number of these to survive into the next generation - all of the others are discarded. It then 'breeds' solutions to try and capitalise on their strengths (and minimise their weaknesses). It may also apply a 'mutation' to a solution during the process... it will randomly change one value of the solution. Each solution to a GA is tested against a fitness function... in the example above, the function is our mathematical formula. During the evaluation phase, the fitness of each solution is calculated and stored. During reproduction, we make use of a roulette wheel system to select a number of solutions for survival into the next generation. Consider if we had five solutions with the following fitnesses:
If we want to reproduce five solutions into the new generation, we sum up the fitnesses of all the current solutions (750) and then generate a random number that tells us which solution is to survive... the fitness of a particular solution relates to the proportional chance it has to reproduce into the next generation.
Once we have a new set of solutions, we then apply the cross-over principle. We pick two solutions at random, and then 'breed' them. Let's imagine our solutions are represented by an array of four numbers... to breed these solutions, we pick a random crossover point, and then we swap their arrays:
As with reproduction, solutions are chosen for crossover according to a proportional weighting of their fitnesses. The mutation phase is the easiest of all - we have some global 'mutation rate' operator - this is the percentage chance that a solution will be mutated. We step over each of the solutions, effectively rolling the dice on whether to change one of their elements. If it turns out we do, we pick a random element in the array, and then generate a replacement for it:
Mutated:
And then, the process begins again. Let's look at a simplified run-through of this process with five solutions to our example problem.
First, we calculate all the fitnesses:
Next, we pick candidates to be born into the new generation according to a roulette wheel... let's say our random wheel throws up the following new generation:
Next, we pick solutions to crossover... we'll just look at one example where we crossover solutions 1 and 5:
We then look for solutions to mutate. Let's say we mutate solution 3, and that it changes element B to -189. What we're left with at the start of generation two is:
Calculate the fitnesses:
Solution one had a good value for A + B. Solution five had a good value for (C * D). Crossing them over means that solution one benefits from both of these (although solution five gains the worst of both worlds) which increases the overall fitness The mutation of solution three reduced its fitness, but that is not a problem - it's possible that even weak solutions have material that will benefit stronger solutions in a later generation... and if they do prove to be useless, the roulette wheel will filter them out eventually.
So, why spend all that time explaining Genetic Algorithms? Well, we're going to write one! And not only that, it's going to be an engine for a genetic algorithm that can handle any kind of properly formed solution. Polymorphism and interfaces are going to be a big help to us here. The engine is going to provide the following mechanisms, which are common to all standard genetic algorithms:
However, our engine won't know how to crossover or mutate a solution - after all, that's very much dependant on the kind of solution. The solution representations themselves will handle that internally. Also, we don't know how to calculate the fitness - that too depends on what kind of problem the GA is aimed at. What we're going to need is to provide a structure we can rely on - so let's write an interface that all GA solutions must implement:
Believe it or not, that's all we need to begin writing our GA Engine... provided we are working with a class that implements Solution, we know the format that any method we are going to require will take.
Our engine needs to do a number of things. It must setup a starting set of solutions... we'll assume that any class that implements Solution will setup the starting state in its constructor, so all we do is find a suitable representation for a set of solutions (an ArrayList) and create a suitable number of instances. We do need to know here what the name of the class we're going to work with is going to be... we don't need to know any of the functionality, just what it will be called. This is the only place where we need to know that... let's just put in a facility for a place-holder called SimpleSolution:
We then need to implement the framework for the genetic algorithm. Before we can implement any of the framework, we need to implement a roulette wheel - and for that, we need to have a method that totals up the fitnesses of all solutions. Our interface ensures that when we call getFitness on a given solution it will return an integer value - we can make use of this knowledge in our roulette wheel framework:
Now all we need to do to get a proportionally weighted solution is to call selectSolution, passing the whole list of the solutions as a parameter. This gives us what we need to implement the reproduction aspect of the engine:
Likewise for crossover:
Note here that we've not actually implemented these parts of the genetic algorithm - we've just provided the methods in the engine that call the interface methods at the appropriate time. We still need to implement mutation and the generational iteration... we do these in the constructor:
We should also provide some general useful output, such as the average fitness of a generation and what the best solution in the whole run is - these are included in the full version of this program that accompanies the book.
With this, we have the framework for a polymorphic genetic algorithm engine... now we need to provide an actual concrete example of a Solution... we'll use the example we talked about above when we discussed genetic algorithms in general. We've got a placeholder called SimpleSolution, so let's call it that - it's only a proof of concept after all:
We have different values to occupy each element of the list of numbers, so we need a mechanism for populating those - this is our getElementValue method. We'll use a simple array of double precision primitives to store the values of our solution:
With these values, we can implement the getFitness method. This is where the implementation of our fitness function goes - in this case, it's pretty easy - we add elements 0 and 1 together, and then add the sum of the product of elements 2 and 3. We then set a lower limit of 1 on the result and return it as an integer:
Next, we need to implement mutation. All we do for this is generate a random number, which is the index we wish to change... then we call the getElementValue utility method:
All that's left to do now is to implement the crossover method. Here we're going to provide a couple of simple utility methods that let us manipulate the value of all the values contained within the solution:
Our crossover method chooses a random crossover point, and then proceeds to swap the elements of the appropriate arrays. Here, we need to have a firm knowledge of what kind of solution we're going to be making use of, because our utility methods are not prescribed in the interface. Luckily, although the object to crossover is passed into the method as a Solution, we can cast it back to a SimpleSolution within the body of the method... after all, if someone tries to cross an instance of one Solution with an instance of another Solution then there's something wrong - in exactly the same way you can't take the square root of an apple, Solutions must work in their proper context.
And that's it... our first GA Solution element has been built through the use of polymorphism. Neato... now when we run this, it will (eventually) generate a solution that is a good (although perhaps not the best) solution to the problem as specified in the getFitness function.
The big benefit here is that our engine has no knowledge at all of what the problem is, or how the mechanisms of mutation or reproduction are implemented... this means that provided we have a class that implements our Solution interface, our GA can deal with absolutely any problem. For example, consider a GA which worked to maximise the number of vowels in a string of twenty letters:
We can change the whole result of our GA by changing the GeneticEngine by one single line:
Really, if we want to make sure that the engine is entirely polymorphic we should remove the initial population creation into a separate container application. In doing so, we change it so that it is triggered via a start method rather than it just happening when it is created. This is a good idea, because if we want to use this as a component, we really shouldn't require anyone to have to change the code inside our engine... we package it up and have it work through an entirely black box system. We simple require the developer making use of our component to provide the engine with an ArrayList of objects that implement the Solution interface... we give them a setSolutions method so that they can do just that. When they've provided their starting solutions, they call the start method to have the GA churn out the results:
Now when someone wants a GA, they create an instance of our engine and then create their own starting set of populations... in this way, the engine itself never has to be changed, only the application making use of it. Our engine has become black box - nobody needs to know how it works internally.
And now, we can package up our GA engine (and the accompanying Solution class) into a single package, and then anyone in the world can make use of the inherent polymorphism to drive their own searches using the tool. Really, we should spend some time working on the link between the engine and any container applications - at the moment, all output is to the console, and really the developer using the tool should be able to decide where it goes. That's left as an exercise for the interested reader.
This has been a pretty complex example, because the idea of a genetic algorithm is complicated and we've had to skirt over some pretty important stuff to use it as a case study. The mechanics behind how a GA works are pretty interesting, and I'd recommend that anyone who is intrigued by the idea should spend some time reading up on the subject - they are truly remarkable algorithms that work on simple principles. This particular implementation is driven entirely by polymorphism - our engine only ever knows any given implementation of the Solution interface as the generic Solution - the interface provides the framework for the methods our engine needs. Without know how a fitness is generated, our engine can just rely on the implementer of the class to work that part out and just concentrate entirely on how all the solutions fit together within a specific process. That's pretty neat! This idea of polymorphism is vital to a solid understanding of object orientation - it underpins almost everything else that OO provides. It's also a very powerful tool for achieving very definite aims... although the idea is quite abstract, it translates into a technique that allows for real generic functionality in even complex programs. Further ReadingThe following table details further reading on the topic in this chapter, and also any external resources that you may find useful.
|
| Previous | Table of Contents | Next |
© 2004-2006 Michael James Heron