Monkeys at Keyboards: Java-Fu
© Michael James Heron
Topic: Java Programming
Level: 3
Version: beta

The Army has carried the American ideal to its logical conclusion. Not only do they prohibit discrimination on the grounds of race, creed and color, but also on ability.
Tom Lehrer

6 - Case Study 2 - A Polymorphic Genetic Algorithm

PreviousTable of ContentsNext
Forum


Chapter Objectives

By the end of this chapter, the reader will be able to:



    6.1

    Introduction

    The ability to successfully apply polymorphic principles is one of the most valuable skills an object-oriented programmer can possess. Polymorphism drives adaptability and allows for complex functionality to be shared across even substantially different programming structures. It is also, fittingly, a very difficult skill to develop... it is really only by looking at many different examples of where polymorphism has been applied that we can begin to see the patterns that emerge.

    Polymorphism is used all the way through the Java libraries - there are no shortages of areas that we could investigate for examples:

    • The way an ArrayList stores its data.
    • The way Swing interfaces work
    • The way collections and arrays can be generically sorted and searched.

    But more than this, we must look at otherwise exotic examples and see how polymorphism can be applied to provide a solid structure for expandable development. The best routines are applicable to as many different circumstances as possible... in this chapter we're going to look at a fascinating tool of searching. The genetic algorithm is a powerful search technique often applied to very complex problems... the big benefit of a genetic algorithm (GA) is that we don't actually need to know what we're looking for... we just need to know when we've found it.

    6.2

    A Brief Introduction to Genetic Algorithms

    With a GA, we pose a problem... what that problem is doesn't matter. One common example is to maximise the results of a mathematical formula. We start off with a pool of solutions. Each of these solutions is a potential answer to the problem. To take a simplified example, if we had a GA that was to maximise the following formula:

    A + B + (C * D)

    Each solution would be a combination of values for A, B, C, and D. Usually we provide some limits to what these values are... for example, 'A can be any number between one and one hundred. B will be between -100 and -200. C is a real number between zero and one inclusive, and D is a number between zero and one thousand).

    A solution to this problem is a set of these four numbers... for example:

    A = 20, B = -122, C = 0.50, D = 340

    These values then get plugged into the formula:

    20 + -122 + (0.50 * 340)

    The results are then computed, and the result of this is classified as the fitness of a solution. In this case, the solution's fitness is 272. Traditionally, a fitness will be a positive whole number... in cases where a particular solution calculates at a negative number, it will be given a fitness of 1... we'll see why later in this chapter.

    A genetic algorithm begins with an initial set of randomly generated solutions... usually anything between 100 and 1000 solutions make a suitable starting set. Note that these are randomly generated - we have absolutely no way of knowing in advance how good any of these solutions are.

    The GA is an iterative process - each step of the iteration is called a generation. The driving force behind a GA is to 'evolve' an answer to the problem. During each generation, the GA calculates how good each solution is, and then it picks a number of these to survive into the next generation - all of the others are discarded. It then 'breeds' solutions to try and capitalise on their strengths (and minimise their weaknesses). It may also apply a 'mutation' to a solution during the process... it will randomly change one value of the solution.

    Each solution to a GA is tested against a fitness function... in the example above, the function is our mathematical formula. During the evaluation phase, the fitness of each solution is calculated and stored.

    During reproduction, we make use of a roulette wheel system to select a number of solutions for survival into the next generation. Consider if we had five solutions with the following fitnesses:

    Starting Population
    Fig 6.1: Starting Population

    If we want to reproduce five solutions into the new generation, we sum up the fitnesses of all the current solutions (750) and then generate a random number that tells us which solution is to survive... the fitness of a particular solution relates to the proportional chance it has to reproduce into the next generation.

    Fitness Graph
    Fig 6.2: Fitness Graph

    Once we have a new set of solutions, we then apply the cross-over principle. We pick two solutions at random, and then 'breed' them. Let's imagine our solutions are represented by an array of four numbers... to breed these solutions, we pick a random crossover point, and then we swap their arrays:

    Step 1
    Fig 6.3: Step 1
    Step 2
    Fig 6.4: Step 2
    Step 3
    Fig 6.5: Step 3

    As with reproduction, solutions are chosen for crossover according to a proportional weighting of their fitnesses.

    The mutation phase is the easiest of all - we have some global 'mutation rate' operator - this is the percentage chance that a solution will be mutated. We step over each of the solutions, effectively rolling the dice on whether to change one of their elements. If it turns out we do, we pick a random element in the array, and then generate a replacement for it:

    Original Solution
    Fig 6.6: Original Solution

    Mutated:

    Mutated Solution
    Fig 6.7: Mutated Solution

    And then, the process begins again.

    Let's look at a simplified run-through of this process with five solutions to our example problem.

    Five Solutions
    Fig 6.8: Five Solutions

    First, we calculate all the fitnesses:

    Solution Fitness
    Fig 6.9: Solution Fitness

    Next, we pick candidates to be born into the new generation according to a roulette wheel... let's say our random wheel throws up the following new generation:

    New Generation
    Fig 6.10: New Generation

    Next, we pick solutions to crossover... we'll just look at one example where we crossover solutions 1 and 5:

    Before
    Fig 6.11: Before
    After
    Fig 6.12: After

    We then look for solutions to mutate. Let's say we mutate solution 3, and that it changes element B to -189. What we're left with at the start of generation two is:

    Next Generation
    Fig 6.13: Next Generation

    Calculate the fitnesses:

    New Generation Fitness
    Fig 6.14: New Generation Fitness

    Solution one had a good value for A + B. Solution five had a good value for (C * D). Crossing them over means that solution one benefits from both of these (although solution five gains the worst of both worlds) which increases the overall fitness The mutation of solution three reduced its fitness, but that is not a problem - it's possible that even weak solutions have material that will benefit stronger solutions in a later generation... and if they do prove to be useless, the roulette wheel will filter them out eventually.

    6.3

    The Relevance To Us

    So, why spend all that time explaining Genetic Algorithms? Well, we're going to write one! And not only that, it's going to be an engine for a genetic algorithm that can handle any kind of properly formed solution. Polymorphism and interfaces are going to be a big help to us here.

    The engine is going to provide the following mechanisms, which are common to all standard genetic algorithms:

    • Selection of solutions to mutate
    • Selection of solutions to crossover
    • Selection of solutions to reproduce
    • Evaluation
    • Roulette wheel selection

    However, our engine won't know how to crossover or mutate a solution - after all, that's very much dependant on the kind of solution. The solution representations themselves will handle that internally. Also, we don't know how to calculate the fitness - that too depends on what kind of problem the GA is aimed at.

    What we're going to need is to provide a structure we can rely on - so let's write an interface that all GA solutions must implement:


    public interface Solution {
    // Returns an integer representing a solution's fitness.
    int getFitness();
    // Implements the crossover of two solutions.
    Solution crossover (Solution tmp);
    // Implements the mutation of a solution
    void mutate();
    }

    Believe it or not, that's all we need to begin writing our GA Engine... provided we are working with a class that implements Solution, we know the format that any method we are going to require will take.

    6.4

    The Genetic Engine

    Our engine needs to do a number of things. It must setup a starting set of solutions... we'll assume that any class that implements Solution will setup the starting state in its constructor, so all we do is find a suitable representation for a set of solutions (an ArrayList) and create a suitable number of instances. We do need to know here what the name of the class we're going to work with is going to be... we don't need to know any of the functionality, just what it will be called. This is the only place where we need to know that... let's just put in a facility for a place-holder called SimpleSolution:

    import java.util.*; 

    public class GeneticEngine {
    public static final int POPULATION_SIZE = 250;
    public static ArrayList<Solution> solutions = new ArrayList<Solution>();

    public static void main (String args[]) {
    GeneticEngine main = new GeneticEngine();
    }


    public GeneticEngine() {
    Solution tmp;
    for (int i = 0; i < POPULATION_SIZE; i++) {
    tmp = new SimpleSolution();
    solutions.add (tmp);
    }

    }

    We then need to implement the framework for the genetic algorithm. Before we can implement any of the framework, we need to implement a roulette wheel - and for that, we need to have a method that totals up the fitnesses of all solutions. Our interface ensures that when we call getFitness on a given solution it will return an integer value - we can make use of this knowledge in our roulette wheel framework:

    int getGenerationTotal (ArrayList gen) { 
    int total = 0;
    Solution tmp;
    for (int i = 0; i < gen.size(); i++) {
    tmp = gen.get (i);
    total += tmp.getFitness();
    }

    return total;
    }

    Solution selectSolution (ArrayList<Solution> gen) { 
    int generationTotal = getGenerationTotal (gen);
    int roulette = (int) (Math.random() * generationTotal);
    int current = 0;
    int element = 0;
    Solution tmp = null;
    while (current <= roulette) {
    tmp = gen.get (element);
    current += tmp.getFitness();
    element += 1;
    }

    return tmp;
    }

    Now all we need to do to get a proportionally weighted solution is to call selectSolution, passing the whole list of the solutions as a parameter. This gives us what we need to implement the reproduction aspect of the engine:

    ArrayList reproduction (ArrayList<Solution> gen) { 
    ArrayList<Solution> newGeneration = new ArrayList<Solution>();
    for (int i = 0; i < solutions.size(); i++) {
    newGeneration.add (selectSolution (gen));
    }

    return newGeneration;
    }

    Likewise for crossover:

    void crossover (ArrayList gen) { 
    Solution i = selectSolution (gen);
    Solution j;
    j = selectSolution (gen);
    j = i.crossover (j);
    }

    Note here that we've not actually implemented these parts of the genetic algorithm - we've just provided the methods in the engine that call the interface methods at the appropriate time. We still need to implement mutation and the generational iteration... we do these in the constructor:


    public class GeneticEngine {
    public static final int POPULATION_SIZE = 250;
    public static ArrayList<Solution> solutions = new ArrayList<SOlution>();
    public static int generations = 0;
    public static final double CROSSOVER_RATE = 70;
    public static final double MUTATION_RATE = 0.05;
    public static final int END_GENERATION = 1000;

    public GeneticEngine() {
    Solution tmp;
    for (int i = 0; i < POPULATION_SIZE; i++) {
    tmp = new SimpleSolution();
    solutions.add (tmp);
    }

    for (int i = 0; i < END_GENERATION; i++) {
    System.out.println ("Generation: " + i);
    for (int j = 0; j < solutions.size(); j++) {
    solutions = reproduction (solutions);
    if (Math.random() < CROSSOVER_RATE) {
    crossover (solutions);
    }

    if (Math.random() < MUTATION_RATE) {
    tmp = solutions.get (j);
    tmp.mutate();
    }

    }

    }

    }

    We should also provide some general useful output, such as the average fitness of a generation and what the best solution in the whole run is - these are included in the full version of this program that accompanies the book.

    6.5

    The SimpleSolution Proof of Concept

    With this, we have the framework for a polymorphic genetic algorithm engine... now we need to provide an actual concrete example of a Solution... we'll use the example we talked about above when we discussed genetic algorithms in general. We've got a placeholder called SimpleSolution, so let's call it that - it's only a proof of concept after all:


    public class SimpleSolution implements Solution {
    }


    public int getRandomNumber (int max) {
    return (int) (Math.random() * max);
    }


    public double getElementValue (int element) {
    switch (element) {
    case 0:
    return (double) getRandomNumber (100);
    case 1:
    return (double) (-1 * (100 + getRandomNumber (100)));
    case 2:
    return Math.random();
    case 3:
    return (double) (100 + getRandomNumber (900));
    }

    return 0.0;
    }

    We have different values to occupy each element of the list of numbers, so we need a mechanism for populating those - this is our getElementValue method. We'll use a simple array of double precision primitives to store the values of our solution:


    public class SimpleSolution implements Solution {
    private double numbers[];

    public SimpleSolution() {
    numbers = new double[4];
    for (int i = 0; i < numbers.length; i++) {
    numbers[i] = getElementValue (i);
    }

    }

    }

    With these values, we can implement the getFitness method. This is where the implementation of our fitness function goes - in this case, it's pretty easy - we add elements 0 and 1 together, and then add the sum of the product of elements 2 and 3. We then set a lower limit of 1 on the result and return it as an integer:


    public int getFitness() {
    double total = 0;
    int itotal;
    total = numbers[0] + numbers[1] + (numbers[2] * numbers[3]);
    itotal = (int) total;
    if (itotal < 1) {
    return 1;
    }

    else {
    return itotal;
    }

    }

    Next, we need to implement mutation. All we do for this is generate a random number, which is the index we wish to change... then we call the getElementValue utility method:


    public void mutate() {
    int num = getRandomNumber (numbers.length);
    numbers[num] = getElementValue (num);
    }

    All that's left to do now is to implement the crossover method. Here we're going to provide a couple of simple utility methods that let us manipulate the value of all the values contained within the solution:


    public double getElement (int index) {
    return numbers[index];
    }


    public void setElement (int index, double val) {
    numbers[index] = val;
    }

    Our crossover method chooses a random crossover point, and then proceeds to swap the elements of the appropriate arrays. Here, we need to have a firm knowledge of what kind of solution we're going to be making use of, because our utility methods are not prescribed in the interface. Luckily, although the object to crossover is passed into the method as a Solution, we can cast it back to a SimpleSolution within the body of the method... after all, if someone tries to cross an instance of one Solution with an instance of another Solution then there's something wrong - in exactly the same way you can't take the square root of an apple, Solutions must work in their proper context.


    public Solution crossover (Solution tmps) {
    int crossover = getRandomNumber (numbers.length);
    double tmp;
    SimpleSolution tmpsol = (SimpleSolution) tmps;
    for (int i = crossover; i < numbers.length; i++) {
    tmp = numbers[i];
    numbers[i] = tmpsol.getElement (i);
    tmpsol.setElement (i, tmp);
    }

    return tmpsol;
    }

    And that's it... our first GA Solution element has been built through the use of polymorphism. Neato... now when we run this, it will (eventually) generate a solution that is a good (although perhaps not the best) solution to the problem as specified in the getFitness function.

    6.6

    The StringSolution

    The big benefit here is that our engine has no knowledge at all of what the problem is, or how the mechanisms of mutation or reproduction are implemented... this means that provided we have a class that implements our Solution interface, our GA can deal with absolutely any problem. For example, consider a GA which worked to maximise the number of vowels in a string of twenty letters:


    public class StringSolution implements Solution {
    String str;

    public int getRandomNumber (int max) {
    return (int) (Math.random() * max);
    }


    public StringSolution() {
    char[] letters = new char[20];
    for (int i = 0; i < letters.length; i++) {
    letters[i] = generateRandomLetter();
    }

    str = new String (letters);
    System.out.println (str);
    }


    public boolean checkVowel (char c) {
    if (c == 'a' || c == 'e' || c == 'i' || c == 'o'|| c == 'u') {
    return true;
    }

    return false;
    }


    public void mutate() {
    char[] letters = str.toCharArray();
    int which = getRandomNumber (letters.length);
    letters[which] = generateRandomLetter();
    str = new String (letters);
    }

    char generateRandomLetter() {
    int a = 'a';
    int max = 26;
    return (char) (a + getRandomNumber (max));
    }


    public int getFitness() {
    int total = 1;
    for (int i = 0; i < str.length(); i++) {
    if (checkVowel (str.charAt (i))) {
    total += 1;
    }

    }

    return total;
    }


    public String getString() {
    return str;
    }


    public void setString (String value) {
    str = value;
    }


    public Solution crossover (Solution sol) {
    int crossover = str.length();
    StringSolution tmpsol = (StringSolution) sol;
    char tmp;
    char[] sol1, sol2;
    sol1 = str.toCharArray();
    sol2 = tmpsol.getString() .toCharArray();
    for (int i = crossover; i < str.length(); i++) {
    tmp = sol1[i];
    sol1[i] = sol2[i];
    sol2[i] = tmp;
    }

    tmpsol.setString (new String (sol2));
    str = new String (sol1);
    return tmpsol;
    }


    public String toString() {
    return str;
    }

    We can change the whole result of our GA by changing the GeneticEngine by one single line:

    for (int i = 0; i < POPULATION_SIZE; i++) { 
    tmp = new StringSolution();
    solutions.add (tmp);
    }

    Really, if we want to make sure that the engine is entirely polymorphic we should remove the initial population creation into a separate container application. In doing so, we change it so that it is triggered via a start method rather than it just happening when it is created.

    This is a good idea, because if we want to use this as a component, we really shouldn't require anyone to have to change the code inside our engine... we package it up and have it work through an entirely black box system. We simple require the developer making use of our component to provide the engine with an ArrayList of objects that implement the Solution interface... we give them a setSolutions method so that they can do just that.

    When they've provided their starting solutions, they call the start method to have the GA churn out the results:

    public void setSolutions (ArrayList value) { 
    solutions = value;
    }


    public void start() {
    Solution tmp;
    theBest = (Solution) solutions.get (0);
    for (int i = 0; i < END_GENERATION; i++) {
    System.out.println ("Generation: " + i);
    for (int j = 0; j < solutions.size(); j++) {
    solutions = reproduction (solutions);
    if (Math.random() < CROSSOVER_RATE) {
    crossover (solutions);
    }

    if (Math.random() < MUTATION_RATE) {
    tmp = (Solution) solutions.get (j);
    tmp.mutate();
    }

    }

    System.out.println ("" + theBest);
    }

    }

    Now when someone wants a GA, they create an instance of our engine and then create their own starting set of populations... in this way, the engine itself never has to be changed, only the application making use of it. Our engine has become black box - nobody needs to know how it works internally.

    import java.util.*; 

    public class GeneticAlgorithmContainer {
    public static final int POPULATION_SIZE = 250;
    ArrayList<Solution> solutions;

    public GeneticAlgorithmContainer() {
    GeneticEngine engine = new GeneticEngine();
    Solution tmp;
    solutions = new ArrayList<Solution>();
    for (int i = 0; i < POPULATION_SIZE; i++) {
    tmp = new StringSolution();
    solutions.add (tmp);
    }

    engine.setSolutions (solutions);
    engine.start();
    }


    public static void main (String args[]) {
    GeneticAlgorithmContainer main = newGeneticAlgorithmContainer();
    }

    }

    And now, we can package up our GA engine (and the accompanying Solution class) into a single package, and then anyone in the world can make use of the inherent polymorphism to drive their own searches using the tool. Really, we should spend some time working on the link between the engine and any container applications - at the moment, all output is to the console, and really the developer using the tool should be able to decide where it goes. That's left as an exercise for the interested reader.

    6.7

    Conclusion

    This has been a pretty complex example, because the idea of a genetic algorithm is complicated and we've had to skirt over some pretty important stuff to use it as a case study. The mechanics behind how a GA works are pretty interesting, and I'd recommend that anyone who is intrigued by the idea should spend some time reading up on the subject - they are truly remarkable algorithms that work on simple principles.

    This particular implementation is driven entirely by polymorphism - our engine only ever knows any given implementation of the Solution interface as the generic Solution - the interface provides the framework for the methods our engine needs. Without know how a fitness is generated, our engine can just rely on the implementer of the class to work that part out and just concentrate entirely on how all the solutions fit together within a specific process. That's pretty neat!

    This idea of polymorphism is vital to a solid understanding of object orientation - it underpins almost everything else that OO provides. It's also a very powerful tool for achieving very definite aims... although the idea is quite abstract, it translates into a technique that allows for real generic functionality in even complex programs.

    Further Reading

    The following table details further reading on the topic in this chapter, and also any external resources that you may find useful.

    ResourceDescription
    NothingThere's no links as yet, so go find your own you shiftless wasters.

    PreviousTable of ContentsNext

    © 2004-2006 Michael James Heron