![]() | Monkeys at Keyboards: The Javanomicon © Michael James Heron | ||||
| Topic: Java Programming Level: 2 Version: delta | |||||
7 - Hooray for Arrays | |||||
| Previous | Table of Contents | Next |
| Forum |
| Chapter Objectives |
By the end of this chapter, the reader will be able to:
|
In previous chapters we've been looking at the skills we require in order to develop simple Java applets. We've looked at the fundamental building blocks of the language and how to set up graphical user interfaces using the Swing GUI architecture. In this chapter, we're going to take a look at one of the more complex structures provided by Java for representing data. Primarily we will be looking at the idea of arrays and some of the specialisations of this concept provided by the Java object library - mainly the ArrayList class. I know that many of you will already be familiar with the idea of arrays, but is is a subject of sufficient complexity to warrant a further discussion within the text of this book
The variables we have used so far are individual units of data. They live a lonely, solitary life by themselves - unconnected and unrelated to even conceptually similar information. In real life however we must often deal with 'clumps' of related data. It is very useful for us to be able to refer to whole sets of data in a single statement... we can do this in Java through the medium of arrays. Arrays are conceptually-linked lists of data - instead of having multiple pieces of data we have a single piece of data that contains a number of different compartments into which information can be inserted. Arrays are homogenous - they have multiple compartments, but all the data that is placed in them must be of the same type. The benefit of arrays comes when we're dealing with many instances of the same kind of data. Consider a program that lets us store the ages of five people. We can implement the interface to this as a text field where we enter ages and a button that lets us add an age to our current list. To store five ages, we need to have five integer variables. We also need to keep track of how many ages we've entered so that we know into which variable we are going to put a particular age:
This is a workable solution - for five ages. What if we wanted to store ten, or twenty, or a hundred, or a thousand? Obviously the code for this is not very extensible, but this is only the beginning. The problem comes when we want to manipulate this data in order to get some meaningful information out of it. Consider if we want the average:
Again, this isn't a problem - for five ages. It's more of a problem for one hundred. Consider now finding the highest age in the set of data:
Imagine having to write that for a hundred ages - not only is it a nightmare to code with huge amounts of repetition, it's also crying out for a transcription error when you lose track of where you are and assign the wrong value to the wrong variable.
Instead of writing such unwieldy code using single variables, we make use of arrays. In many ways, they work identically to other data types except that they have a slightly different syntactical notation. Arrays are assigned a single name - this is the name of the variable. Arrays are broken into a number of compartments, which are called elements. Each of these elements has an index, which is a numerical indication of in which compartment they are:
The Java programming language provides two ways of referencing to an array in code. One is to refer to the whole array, which is done by referring to the variable name as with other data types - when we do this, we are making reference to the entire set of data rather than any of the actual data elements it contains. The other way we have of accessing an array is by specifying a particular index - this lets us access the data contained within an element. We do this by using a square bracket notation system - we give the name of the array, and then the index contained within square brackets. With our example above, if we wanted the element stored in position 2 of the array, we would use the following notation:
You probably noticed that the first element in the array has an index of 0 - this is a feature of Java and other C-type languages. For arrays we start counting from zero. This is sometimes quite hard to get used to, but eventually it becomes second nature. If we wanted to change the data container within an element, we use the same square bracket notation:
After this assignment statement, our array is changed so that the element at position 2 is overwritten with the new value:
So, why do we want to use arrays? Mainly because they are flexible - there is no difference in terms of code complexity in using an array that has ten, or a hundred, or a thousand elements. This is a marked improvement over using separate variables to hold each instance of data. They are coherent - conceptually-linking related pieces of information allows for us to perform operations on the whole set of information without having to worry about making sure we're referring to everything we should be and without the risk of missing out variables that should be included in the operation. They are simple to manipulate - their structure is perfectly suited to manipulation through the use of standard programming constructs such as for loops. We'll see some examples of this later in the chapter.
We've seen the notation for getting and setting elements of data within an array, but we haven't yet seen how we create them in code. To create an instance of an array, we need three pieces of information:
There are two stages to setting up an array (actually, these are the same two steps in setting up any variable, but understanding this is of particular importance when using arrays):
The first step we have seen before when we have created the class wide variables to hold GUI components:
What this line of code says to Java is 'Hey, we're going to need some memory to store a button - could you set enough memory aside for this for me? Later on, we'll refer to the chunk of memory you give us with the identifier myButton'. At this point, there is nothing in the memory - there is no actual button there, there's just a reference to a chunk of memory large enough to store a button. If you try to call a method on this variable before something is put into the memory, you get what is called a null pointer exception. Before you can actually do anything with this variable, you need to put something into it... we do this with the new keyword:
Once this line of code is executed, Java creates the button itself and stores it in the space in memory that was reserved. These two steps are often combined into one line of code:
But even though this is now one single statement, there are still two discreet steps involved in setting up the variable. So, when declaring an array we need to keep this in mind. For the first part of the declaration we need to indicate to Java the name, the type, and the number of dimensions of the array (more on the dimensions part later). To create the array we looked at above, first we need to declare some space in memory:
Notice that we don't give the number of elements here - we do that when actually putting an array into the space of memory we've declared:
And that's it, we've created our array. At this point, the array looks like this:
We can then set individual elements of the array in the way we saw above:
There is a second way to perform the second step of the array declaration, and that is to give Java the list of elements that should be contained within - from this, Java will work out how large the array needs to be itself:
This creates the following array:
We can use this exact same notation to declare arrays of integers, floating point numbers or booleans. We use exactly the same notation to declare arrays of more complex data types also, like Strings, or JButtons:
There is a slight difference in how Java populates an initial array. With primitive data types, all the elements of the array are set to 0. With complex data types, all of the elements of the array as set to null. We need to put something into an element of the array before we can do anything:
If we don't put something into the element with the new keyword, a null pointer exception will be encountered any time we try to call a method:
This is because although we have declared that we want an array of complex data types (like buttons), we've only created the container for the set of these objects - we haven't actually created the objects themselves. We must do this before we can perform any operations on the elements on the array.
Like most things in Java, arrays are actually objects. We've yet to look into this idea in any depth, but the key point is that like JButtons, JComboBoxes and JScrollBars - arrays in Java have methods and properties that go with them. Much of the functionality for dealing with the day to day operations of an array is handled by the provided methods. For example, it was once a common coding task to write a sorting algorithm... a piece of code that would go through an unordered list and turn it into an ordered list. For example, consider an unordered list of integers:
Through the application of code, this would be turned into an ordered list of integers:
It was once necessary to write the handling code to do this, but with the advent of object oriented programming most of this functionality is provided in the object libraries of languages like Java. To sort a list of unordered elements, we simply call the sort method of the Arrays class and pass our array as an argument:
The Arrays class is found in java.util, so the line import java.util.*; must be placed at the top of your code before you can make use of it. The case is the same for searching, which has previously been another vehicle useful for explaining array manipulation. Searching has an associated method in the Arrays class. We can call the method binarySearch, passing our array and the element we wish to search for. This method will return the index of where this element may be found in the array:
However, the binarySearch method will only work on a fully sorted list. If you want to make use of it, you must ensure your array has had the sort method called on it. There is still a requirement for developers to write searching code to locate data elements that are not stored in an ordered list. We'll talk a little more about searching later in the chapter. The array itself has a property called length that tells us how many elements it contains - obviously we will already know this since we have to pass the number of elements as part of the declaration of an array. It comes in useful however when we pass arrays to and from methods, as we shall see in a few moments.
If myIntArray contains nine elements, then numberOfElements will then have the value 9.
Now we've looked at the idea of single dimensional arrays - we can look a little at the idea of multi-dimensional arrays. Java makes it simple to extend arrays in any number of dimensions without any difficulty (beyond the conceptual). If one dimensional arrays are like tables of data, then two dimensional arrays are like boards:
We can declare two dimensional arrays in a very similar way to how we declare one dimensional arrays. We simply need to add an extra set of square brackets to indicate dimensionality:
These two statements will create a two dimensional board of integers that extends five elements in the X dimension and five elements in the Y dimension. We can then 'pinpoint' elements in this array by passing both an x and y index:
We can set elements in the array in the same way, by pinpointing which element we wish to access:
We can increase the number of dimensions by adding an extra set of square brackets for each dimension. For example, a three dimensional array:
To pinpoint elements, we simply pass an index for each axis of dimensionality:
The only difficult in extending arrays in this way is conceptual - it is very difficult for humans to envisage a data structure that contains more than the standard four dimensions. If you find it necessary to implement arrays of this level of complexity it is often an indication that you should consider a more concise representation of your data.
We can pass arrays to and from methods in the same way we do with primitive data types - all we have to do is indicate to Java that the variables we are dealing with are arrays in the method signature. The syntax for passing an array into a method is unchanged:
We can indicate that an array is expected as a parameter to a method by using the square bracket notation - we don't need to know in advance how many elements, just the number of axis of dimensionality. For a method that accepts a one dimensional array as a parameter:
If we wish to pass in an array of two dimensions, we must also indicate this:
To show an example of this in practise, we will look at an applet that creates a one dimensional array of integers, fills it full of random numbers, and then displays the largest number in the array in a message dialog when a button is pressed.:
As we have seen, arrays are very powerful programming constructs. Unfortunately, they have a disadvantage in that we must know in advance how many elements are going to be stored within. This is not always something that can be anticipated, and it is not possible for arrays to be easily resized in Java. It is possible to just give an insanely large number of elements and simply keep track of where we are in the array with a counter, but that's not an ideal solution. To deal with this problem, Java provides an alternate array structure called an ArrayList. It is one of a family of structures called collections... we will have cause to look at some more of these throughout the book. At its most basic, an ArrayList is simply an expanding array - it will increase in size as you add elements to it. It provides the functionality for adding and removing elements, as well as allowing elements to be easily inserted. This flexibility gives us a structure that can be used to deal with the uncertainty that comes with modern software development. As with arrays, elements within an ArrayList are indexed by an integer value. However, we do not use the square bracket notation with an ArrayList to access its elements. ArrayLists are classes, and as such they are manipulated with method calls. The ArrayList class is found in the java.util package, so your code must import this:
Once you have this import statement, we can create an ArrayList in the same way we can create our GUI components - by creating a variable to hold it and then putting something in to the variable with the new keyword. However, here is where it starts to get a little bit tricky. Newer versions of Java (namely Java 1.5) have introduced a more complex syntax for dealing with collection classes. In addition to declaring that we are going to be using an ArrayList, we must indicate what kind of objects are going to go onto it. We do this by using an angle bracket notation. Let's look for example at an ArrayList that stores String objects - when we create the object with the new keyword, we indicate that it is an ArrayList that will hold String objects:
Once we've done this, we have an empty instance of the ArrayList class- notice that we don't have to declare a number of elements. Our strange looking angle brackets indicate that this is an ArrayList designed for String objects.
We can add elements onto an ArrayList by using the add method, but we must be sure that we are adding elements of the appropriate type. To our myList ArrayList, we can add Strings without any problems:
At the end of these three invocations of the add method, we have an ArrayList that contains three strings, stored in the order in which they were added:
We will however encounter a problem if we try to add numbers to the ArrayList we have declared. The simple attempt to add a number will trigger a compile time error:
If we wish to make an ArrayList that allows for the use of integers, we must create one using the angle bracket notation. Now, here's where things get a little complicated, so hold on tight. It doesn't matter particularly if you follow this explanation - the ramifications are somewhat subtle and most likely won't be an issue for some time down the line. Java has two different kinds of variables. One of these variable types is known as primitive, and the other is known as reference. Java treats these variables differently. Primitive data-types are the simplest of these, and they consist of int, float, char, double, and various other single units of data. Reference data types are the more complex data types, and are always represented by objects in Java. Strings and Arrays are examples of reference data types. Now, to further complicate the issue, each of the primitive data-types has a corresponding reference data type called a Wrapper Class, and these are used when primitive data-types must be used in a complex context. We'll see why this is important when we discuss polymorphism in a later chapter. The result of all of this is that it isn't immediately obvious how to create an ArrayList that can hold a simple int value. Intuitively we would expect something like:
This would seem to make perfect sense, but Java will throw a compile-time error because of the distinction between primitive and reference data types. ArrayLists can only work with objects, and an int is not an object. In older versions of Java, it was necessary to explicitly wrap a primitive before placing it onto an ArrayList:
As of Java 1.5, this is no longer necessary. ArrayLists now support a system called auto-boxing that handles all of this for you. All you need to do is create an ArrayList that will support the relevant wrapper class:
Then we add int values (or Integer objects) as needed:
Every primitive data-type in Java has a corresponding wrapper. The table below shows what these are:
So if we wanted an ArrayList that would hold a collection of double values:
In terms of changing the data within the ArrayList, We can also make use of the set method to change the contents at a particular index - so if we called this method on the String ArrayList defined above:
This overwrites the contents of the element at the specified index and replaces it with what we pass as a second parameter:
Of course, putting things into a data structure is only one part of the functionality - we also need to be able to get things out of it. We do this using the get method of the ArrayList. To use this we pass the numerical index of the element we wish to retrieve:
The introduction of generic types to Java also complicates how we write methods - if we must also specify a type to go along with a collection, then how do we write methods that can accept the right kind of collections as parameters? We can use the standard method, but that gives us the same problems that generics were introduced to solve - that of heterogeneous arrays and mandatory type-casting:
We need to make use of a new syntax to deal with this. We can declare a generic collection as a parameter, but that limits us only to those collections that polymorphically match the typing (more on polymorphism later in the book). Often we want a method to be able to deal with a range of suitable objects. The solution to this is to develop generic methods. We can make use of wildcards within the generic type passed to the method... we indicate this by a question mark:
We can further specify wildcards to ensure that we can still maintain some kind of typing structure:
Unfortunately, passing a generically typed ArrayList as a parameter doesn't have the type-checking compiler protection you might assume. For example, the following is legal Java from a compiler perspective but it will throw a ClassCastException during the running of the program:
The moral of the story is - don't rely on the type-checking. It may not work the way you would expect. The extra level of support provided at the compiler level is not enforced in all generically typed interactions, and so you should still be careful when polymorphically casting or passing generic types to methods.
Collections such as ArrayLists and Arrays provide a concise representation for lists of data - and as such, we often need to step over each element in such a collection with an eye to performing some operation on everything contained within. Java provides us with an excellent structure called the foreach loop to allow for that.Collections such as ArrayLists and Arrays provide a concise representation for lists of data - and as such, we often need to step over each element in such a collection with an eye to performing some operation on everything contained within. Java provides us with an excellent structure called the foreach loop to allow for that. The foreach loop is syntactically similar to a standard for loop - it even uses the same keyword. However, we use a slightly modified syntax for indicating how we want the loop to behave:
The first part of the for loop indicates that we want to refer to each element in turn with the variable name s, and that each of these elements will be a String (we must declare this in the body of the for loop - we cannot use an existing variable). The second part is the collection we wish to step over (in this case, myListOfStrings). The first time around the loop, s refers to the first element of the ArrayList. The second time around, it refers to the second element, and so on until it has stepped over every element in the ArrayList. We can also use this structure with standard arrays in exactly the same way:
The foreach structure is syntactically compact and allows for a simple way for developers to provide a mechanism for performing an operation on every element of data within a collection - none too shabby!
As with standard arrays, Java gives us a class that allows us to perform common manipulation tasks on an ArrayList. This class is called Collections, and like the Arrays class is found in the java.util package. Both the sort and binarySearch methods are available in this class, and both work on the ArrayList class:
There are a large number of other methods supported by this class, but we will only cover these within this chapter. The interested student is invited to examine the Java documentation which will provide further information on what is possible.
A fuller discussion of sorting gains us nothing but Masochism Points at this juncture - the mechanisms provided by Java are more than ample for most of our needs. However, we still need a mechanism for searching unordered lists. We should also look a little more at how a binary search works so that we can understand the need for the list to be sorted. We can of course use the methods indicated above without knowing any of these things, but this is not a book about Java - it's a book about programming and so we will often take detours through material that is useful for gaining a broader appreciation of the subject. Searching techniques will work equally well for arrays and ArrayLists - they are very similar data structures. The only thing that differs is in the syntax. The easiest kind of search to implement is called a linear search. We simply step over every element in the structure until we find the one we want. What we do after that depends on the exact requirements of our search - if we have a method devoted to the search, we may want to return the matching element or we may wish to return an integer indicating where it is to be found in the structure. But wait! If we know what we're looking for, why do we need to search for it? Obviously such a technique is of limited usefulness if we are, for example, searching for a String. We know what the String is - we can just recreate it. However, for more complex objects we may only know a portion of the full set of information - we may only know enough to distinguish between different kinds of objects. Consider for example if Java provided us with a class that let us store all the information about a student. There is no such class... in a few chapters, we'll learn how to write it ourself. Let's say this class provides ways of setting an ID number, a name, a list of grades, an address, a phone number, and all sorts of other useful information. And then let's say we have an ArrayList that contains two thousand students. If all we have is the ID number, then we need a way of finding out the rest of the information - this is where a good search will come in. For example, consider the following linear search on an ArrayList
This is the simplest kind of search - alas, it is also the least efficient. We must assume that, on average, that we must search at least half of the list to find the element we want. This means that the larger the number of elements, the more code has to be executed. In technical terms, it does not scale well... what may work well for ten students will not work nearly as well for ten thousand. Java provides us with a better alternative - a binary search. This works according to a simple heuristic - if we know that a list is sorted, then we know in what direction we must search to find the thing we're looking for. Consider if we have the following list:
If we're looking for a number bigger than our current number, then we look towards the end of the list. If we're looking for a number smaller than our current number, then we look towards the start of the list. A binary search begins searching at the middle point of a list. Let's say we're looking to find where in the list we could find the number 10. The list is nine elements long, and so we begin at element five:
The number ten is greater than the number we currently have... we know therefore that we have to look towards the end of the list rather than towards the start. We can therefore discount everything that occurs before element five... in effect, we only search the following list:
So with this list, we then again take the mid-point:
Now we know that the number 10 is less than the number we currently have - we only need the left hand side of the array now, so we discount the right-hand side.
And then again we take the mid-point:
And there we have our element, and it was found after only three comparisons. With a linear search, it would require six comparisons. For longer lists, the difference is even more marked... a binary search discounts half of the list left to search each time. Consider for example a list of one thousand and twenty-four elements. A linear search will require five-hundred and twelve comparisons on average. A binary search on the other hand
For 2048 elements, we require only 11 comparisons, and for 4096 we require only twelve. This is an algorithm that scales well. However, in the best case situation (the element we are looking for is at the very start of the list), a linear search will find the search element in one comparison compared to the 10 required by the binary search. On the other hand, for the worst case scenario (the element we're looking for is at the very end of the list), the linear search will require 1024 comparisons. A binary search is in general many orders of magnitude more efficient than a linear search... but it comes at a cost in that the list must be sorted. When we come to write our own objects, we'll also find another problem - the standard Java implementation of the binarySearch doesn't work properly with basic classes. That's because Java has no way of being able to tell how you want things to be sorted or compared against... how do we want our Student class to be sorted? Should it be sorted on ID? Or on name? Or on year of registration? There are mechanisms for solving this problem, and we'll discuss them in due course. For now, it is enough to know how to use linear searching when we must, and how a binary search can find those elements we need in a much more efficient manner.
The effective implementation of data structures is one of the hallmarks of a successful application. Arrays and ArrayLists provide a powerful way of modelling conceptually linked sets of data in an efficient way that is easy to manipulate with code. There are a wide variety of other structures provided by Java - we will discuss some of these in the course of the book.
Exercise one
Exercise two
Further ReadingThe following table details further reading on the topic in this chapter, and also any external resources that you may find useful.
|
| Previous | Table of Contents | Next |
© 2004-2006 Michael James Heron