Monkeys at Keyboards: The Javanomicon
© Michael James Heron
Topic: Java Programming
Level: 2
Version: delta

The tube is civilisation!
Penny Arcade

9 - Case Study 2 - Substituion Ciphers

PreviousTable of ContentsNext
Forum


Chapter Objectives

By the end of this chapter, the reader will be able to:

  • Explain and outlinethe processes of simple encryption.
  • Apply string parsing methods to solve a specified problem.
  • Be able to break a string up into regular substrings through use of string tokenization.


9.1

Introduction

In this case study we will be looking at how we can develop a piece of software for encrypting a piece of plaintext, and for printing up a frequency analysis table for a piece of ciphertext. The act of deciphering ciphertext from a frequency analysis table is rather complex (because there is not a simple one to one mapping between ciphertext frequency and corresponding plaintext letter) and considerably beyond the scope of this short digression, but interested students can make an attempt to do so on the basis of nothing more than the tools we have covered so far in the book

9.2

Storyboard

Our case study aside was concerned mainly with the layout of the applet - the functionality was trivial and secondary to our concerns. In this aside, the opposite is true. We are going to have very little in the way of layout and the code will be correspondingly more complex.

Nonetheless, our first step in developing any application is to develop a storyboard - this lets us think through our design and consider exactly how we want it to look.

For the applet we are discussing, we will need three components only - one is a text component, and the other two are buttons. We need the JTextArea component for entering the text we wish encrypted/analysed. We need the buttons for encrypting and analysing respectively:

Substituion Cipher Storyboard
Fig 9.1: Substituion Cipher Storyboard

As you can see, the storyboard for this application is much, much simpler than that for our calculator.

9.3

Setting up the Interface

Similarly, the code for us to set up our interface is also simpler than that for our calculator. We have only three components to set up within our init method:


public void init() {
setLayout (null);
}

We need the text component first. Notice that we don't register an action listener for this - we are only interested in events triggered by the provided buttons.

 JTextArea myText; 
public void init() {
setLayout (null);
myText = new JTextArea (10, 10);
}

We then add a JScrollPane to properly allow for large sections of text:

JTextArea myText; 
JScrollPane myPane;
public void init() {
setLayout (null);
myText = new JTextArea (10, 10);
myPane = new JScrollPane (myText);
myPane.setBounds (10, 10, 480, 200);
add (myPane);
}

For our buttons, we need to implement the encrypt button:

JTextArea myText; 
JButton encrypt;
public void init() {
setLayout (null);
myText = new JTextArea (10, 10);
myText.setBounds (10, 10, 480, 200);
add (myText);
encrypt = new JButton ("Encrypt");
encrypt.setBounds (10, 220, 100, 30);
add (encrypt);
encrypt.addActionListener (this);
}

And then finally we need to implement the decrypt button:

JTextArea myText; 
JButton encrypt;
JButton decrypt;
public void init() {
setLayout (null);
myText = new JTextArea (10, 10);
myText.setBounds (10, 10, 480, 200);
add (myText);
encrypt = new JButton ("Encrypt");
encrypt.setBounds (10, 220, 100, 30);
add (encrypt);
encrypt.addActionListener (this);
decrypt = new JButton ("Encrypt");
decrypt.setBounds (390, 220, 100, 30);
add (decrypt);
decrypt.addActionListener (this);
}

Since we're declaring the ActionListener object for each button as the applet as usual, we need to reflect that in the framework code by adding implements ActionListener to the class definition and setting up a stub for our actionPerformed method:

import java.awt.*; 
import java.applet.*;
import javax.swing.*;
import java.awt.event.*;
import java.util.*;

public class FrequencyAnalysis extends JApplet implements ActionListener
{
JTextArea myText;
JScrollPane myPane;
JButton encrypt;
JButton decrypt;

public void init() {
setLayout (null);
myText = new JTextArea (10, 10);
myPane = new JScrollPane (myText);
myPane.setBounds (10, 10, 480, 200);
add (myPane);
encrypt = new JButton ("Encrypt");
encrypt.setBounds (10, 220, 100, 30);
add (encrypt);
encrypt.addActionListener (this);
decrypt = new JButton ("Encrypt");
decrypt.setBounds (390, 220, 100, 30);
add (decrypt);
decrypt.addActionListener (this);
}

public void actionPerformed (ActionEvent e) {
}

}

Once we've done this, we can compile and execute our applet to see what it looks like in the Real World:

Substituion Cipher Interface
Fig 9.2: Substituion Cipher Interface

At this point during the last case-study, we had done most of the hard-work. In this applet, we've only just begun. The last aside concentrated largely on the user interface design because it's absolutely vital to understanding the rest of the textbook and understanding the content of each of the remaining chapters.

The idea behind GUI components through applets is used all the way through the first part of the textbook and so we needed to spend a lot of time looking at how we can draw interfaces. From this point on (with a single exception) the textbook is more interested in the underlying functionality.

9.4

The Functionality

Okay, here comes the hard part. There are two largely independent blocks of functionality we need to implement for this applet - one for turning plaintext into ciphertext and another for printing out the frequency analysis table of a piece of text. They are both related in that string and character manipulation is a significant part of both, but they are done in entirely different ways.

We shall look at the encryption portion first. Since we have two largely separate pieces of functionality, we are going to break up the code into separate methods that we call from our actionPerformed method. The first of these will be a method called encryptText.

We are going to base the encryption purely on what the user has entered into the text component, so all we need for this method is a single string parameter containing the text that was typed. Once we're done with executing the code that belongs to the method, we're going to return the text in an encrypted format. At the end, we return a string from the method. Our code stub will look like this:


public string encryptText (String plaintext) {
return plainText;
}

Obviously this currently does nothing except return exactly the same text that was passed to it, but we'll use this to provide a context for our actionPerformed method. Writing the stub in this way allows us to visualise clearly what information we are going to need within the method, and what information we are going provide as a consequence of our method's execution.

There are certain things we need to do before we can pass the user's input to the method. For one thing, we need to ensure it's in a consistent format. The first thing we'll do is incorporate an if statement to make sure our code is only executed when the encrypt button is pressed:


public void actionPerformed (ActionEvent e) {
if (e.getSource() == encrypyt) {
}

}

Then we'll add two String variables - One that holds the user's input and one that holds the encrypted results. We'll call these textToEncrypt and encryptedText.

We need to make sure the text is in a consistent format, and we'll do that by setting the whole thing to lower case. The best place to do this is outside of the if statement because we're going to need to do that for both chunks of functionality and it is wasteful (and repetitious) to have it in both conditionals:

Notice for this code here that we're putting the return value of the toLowerCase method back into the variable we called it on. Let's say the user has typed "Hello World" into the text component, and we put the return value of getText into the variable textToEncrypt:

String setup
Fig 9.3: String setup

We then call the toLowerCase method on textToEncrypt - this creates another string in the computer's memory that is a lower case version of the original string:

Second string creation
Fig 9.4: Second string creation

And then finally the textToEncrypt variable is updated to point instead at this new string:

Reassignment of variable
Fig 9.5: Reassignment of variable

The reference count of the first "Hello World" string is now zero. This means that later the "Hello World" string will be recovered and disposed of during garbage collection.

Now that we have a string that contains a lower case version of what the user typed, we can put in a call to our encryptText method within actionPerformed:


public void actionPerformed (ActionEvent e) {
String textToEncrypt;
String encryptedText;
textToEncrypt = myText.getText();
textToEncrypt = textToEncrypt.toLowerCase();
if (e.getSource() == encrypyt) {
}

}

Now if the user types something into the text component and presses encrypt, they'll end up with a lower case version of what they entered. The framework for dealing with encrypting text is now complete; we just need to write the code to actually do it.

9.5

The encryptText Method

The technique for encryption we are going to use is called monoalphabetic substitution, and this works by substituting each letter in the unencrypted text with another letter. For example:

Substitution cipher
Fig 9.6: Substitution cipher

We would encrypt the word dead by substituting each plaintext letter with its ciphertext counterpart: maxm.

This is a basic form of encryption and quite easy to break, but we're not actually trying to protect any data here. Some interesting resources about codes and ciphers are included at the end of the chapter.

The first thing we have to do to implement this encryption is to develop a ciphertext translation for each letter. To do this, we need to create an array of all the letters we are going to use for the plaintext. For the purposes of this chapter, we will look only at alphabetic substitution - numbers will not be affected.

In the chapter on Strings we looked at how char data types are represented as numeric values that are mapped onto actual letters - the system used is called Unicode. We can use this information to build a quick method for creating an array full of the letters from a to z.

We are going to use an ArrayList to do this - the reasons for this are not obvious at the moment since we already know how many letters we have in the alphabet and we will need to use wrapper classes to put characters onto an ArrayList. Later in this section we'll see why this was necessary for this particular implementation.

Remember our code stub for encryptText:


public string encryptText (String plaintext) {
return plainText;
}

We are going to build on this so that instead of just returning the parameter it is passed, it returns a version that has been encrypted using a random monoalphabetic substitution cipher.

The first thing we need to do this is a temporary character that represents the Unicode value of each letter. We are also going to need two ArrayLists. One of these will hold the letters of the alphabet in their traditional order, the other will hold the letters of the alphabet in random order - this will create our substitution code. These are going to hold single letters, and so we must use the generic syntax to indicate this:


public string encryptText (String plaintext) {
ArrayList < Character > myLetters = new ArrayList < Character >();
ArrayList < Character > myCiphers = new ArrayList < Character >();
char Unicode;
return plainText;
}

We're going to start by placing the Unicode value for a into the variable Unicode:


public string encryptText (String plaintext) {
ArrayList < Character > myLetters = new ArrayList < Character >();
ArrayList < Character > myCiphers = new ArrayList < Character >();
char Unicode;
Unicode = 'a';
return plainText;
}

Once we have done this, we can use a variation of the technique we discussed in chapter eight to get each of the letters from a to z through a for loop.

Now we have all the variables we need to build our for loop:

for (i = 0; i < 26; i++) { 
myCiphers.add (Unicode);
myLetters.add (Unicode);
Unicode += 1;
}

The first thing we do around this loop is add the letter represented by the value of Unicode to both ArrayLists. First time around the loop, this is 'a'. We then increment the numerical value of the Unicode variable by one - this moves it onto the next letter. So the second time we go around the loop, it creates a wrapper around the letter b, and so on.

By the end of the for loop, we have two ArrayLists filled with each letter of thea alphabet.

Now, in order to make one of these lists suitable for use as a substitution we need to randomise it - we use a method called shuffle because that's exactly what we're doing - we take the contents of an ArrayList and shuffle them as if they were a deck of cards. There is a method in the Collections class called shuffle that does this - this is why we had to use ArrayLists. The Arrays class has no corresponding method, and while it is possible (and quite easy) to write a method to do this for ourselves, doing it in this way saves us from re-inventing the wheel and also demonstrates a very valid use of the wrapper classes as discussed in chapter seven.

We add a call to the shuffle method of the Collections class to turn one of our arrays into a random list of letters - we'll do this on the myCiphers ArrayList:


public string encryptText (String plaintext) {
ArrayList < Character > myLetters = new ArrayList < Character >();
ArrayList < Character > myCiphers = new ArrayList < Character >();
char Unicode;
Unicode = 'a';
for (i = 0; i < 26; i++) {
myCiphers.add (Unicode);
myLetters.add (Unicode);
Unicode += 1;
}

Collections.shuffle (myCiphers);
return plainText;
}

At the end of this, we have one ArrayList (myLetters) that contains all of the letters in alphabetical order, and one that contains the letters in random order (myCiphers). This puts us into a position similar to that above in that we have a substition system setup:

Substitution cipher
Fig 9.7: Substitution cipher

Now we have to replace every instance of each letter with its ciphertext equivalent. This is done using the replace method discussed in chapter six. However, it is not quite as simple as that - we must make sure that we don't end up overwriting ciphertext with plaintext in the process. Consider the following substitution cipher:

Example in practise
Fig 9.8: Example in practise

If we work through each of the letters alphabetically:

Every instance of a becomes an instance of m:

michmel

Every instance of c becomes an instance of r:

mirhmel

Every instance of e becomes an instance of l:

mirhmll

Every instance of h becomes an instance of x:

mirxmll

Every instance of i becomes an instance of o:

morxmll

Okay so far... but what about when we substitute each instance of L with B?

morxmbb

Hrm - that's not quite right... there's only one L in Michael and yet we've replaced two characters with a B. This is because earlier we replaced an instance of E with an instance of L, and so one letter gets ciphered twice. This also happens with the replacement of m with a:

aorxabb

We can avoid this by making sure that we replace each instance of a lower case letter with an uppercase letter - the replace method is case sensitive and so we will not over-write previous substitutions:

Every instance of a becomes an instance of M:

michMel

Every instance of c becomes an instance of R:

miRhMel

Every instance of e becomes an instance of L:

miRhMLl

Every instance of h becomes an instance of X:

miRXMLl

Every instance of I becomes an instance of O:

mORXMLl

Right, here's where we had the problem this time... what about when we replace every instance of l with a B? Since this is now case sensitive there is no problem:

mORXMLB

And finally, we replace m with A:

AORXMLB

We're going to use a for loop to handle our replacement, iterating over the myLetters ArrayList to do an alphabetic substitution:

for (i = 0; i < myLetters.size(); i++) { 
}

At each step of the loop, we are going to get the element object off of the myLetters ArrayList and store it in a temporary char variable called replaceThis.

We will also get the element off of the myCiphers ArrayList in the same way, but we will use the toUpperCase method of the Character class to get the upper-case version of the letter and store it in the replaceWith variable.

Once we've done this, we'll call the replace method on the String parameter passed to the method and replace every instance of replaceThis with replaceWith. Once we've done this, we'll return the lower case version of our manipulated string:


public String encryptText (String plainText) {
ArrayList < Character > myCiphers = new ArrayList < Character >();
ArrayList < Character > myLetters = new ArrayList < Character >();
String key = "";
char Unicode;
char replaceWith;
char replaceThis;
int i;
Unicode = 'a';
for (i = 0; i < 26; i++) {
myCiphers.add (Unicode);
myLetters.add (Unicode);
Unicode += 1;
}

Collections.shuffle (myCiphers);
for (i = 0; i < myLetters.size(); i++) {
replaceThis = myLetters.get (i);
replaceWith = myCiphers.get (i);
replaceWith = Character.toUpperCase (replaceWith);
plainText = plainText.replace (replaceThis, replaceWith);
}

return plainText.toLowerCase();
}

Voila! One encryption method, served to perfection!

Now we can actually encrypt the text that gets passed into our text component, since we already put the framework into the actionPerformed method:

Plaintext
Fig 9.9: Plaintext

Press the encrypt button, and this becomes:

Ciphertext
Fig 9.10: Ciphertext

And yet, we're only half done since there's another chunk of functionality we need to add - the functionality for displaying a frequency analysis table for the text.

9.6

The analyseText method

The analyseText method is going to have the same parameter type and return type as our encryptText method - the difference is that when the method returns a value it is going to be displayed in a showMessageDialog. Our code stub looks very similar to that of encryptText:


public string analyseText (String plainText) {
}

We are going to use an array of 26 integers to hold the frequency table for the letters - the element as position 0 represents the letter a, element 1 represents b, and so on.

We are going to build up a string of text that contains the details of the frequency of each letter, so we need a String variable for this. We will also need two integers (one for the counter, one for the index of the array) and a char temporary variable:


public string analyseText (String plainText) {
int[] frequencyTable = new int[26];
String text = "";
char letter;
int index;
int i;
}

The charAt method of a string returns the character at a specific index, and the length method returns how many characters are in a string. I sense a for loop coming on!


public string analyseText (String plainText) {
int[] frequencyTable = new int[26];
String text = "";
char letter;
int index;
int i;
for (i = 0; i < plainText.length(); i++) {
letter = plainText.charAt (i);
}

}

This for loop will step over every letter in the string and extract it using the charAt method, storing it in the variable letter. The Unicode representation of the letter a is 97, as discussed in chapter six - but our array only has 26 places. However, we can subtract the integer value of a (97) from any letter that comes off of the string to give a range from 0-25 - exactly what we need for our frequencyTable array:

Letters and Unicode values
Fig 9.11: Letters and Unicode values

This is a concise way of dealing with this problem - we will use this to avoid having to write a lengthy switch statement to deal with turning letters into an appropriate index on the array:

index = (letter - 'a');

For example, if we have the letter 'b', we are really doing the following calculation:

index = 98 - 97;

Incorporating this line of code into our method gives us the following:


public string analyseText (String plainText) {
int[] frequencyTable = new int[26];
String text = "";
char letter;
int index;
int i;
for (i = 0; i < plainText.length(); i++) {
letter = plainText.charAt (i);
index = (letter - 'a');
}

}

However, there is a slight catch - we are not interested in spaces, only in letters. Spaces don't fit particularly cleanly into this numbering convention - they have a Unicode value of 31. If we subtract 97 from 32, we get -65: An invalid index in all possible ways.

But we don't care about spaces, so we can put a piece of handling code in our loop to make sure we just ignore them:


public string analyseText (String plainText) {
int[] frequencyTable = new int[26];
String text = "";
char letter;
int index;
int i;
for (i = 0; i < plainText.length(); i++) {
letter = plainText.charAt (i);
if (letter == ' ') {
continue;
}

index = (letter - 'a');
}

}

We just check to see if the letter has the Unicode value of a space (note, that is an apostrophe followed by a space followed by an apostrophe), and if it does we terminate the loop for that iteration and continue on with the next.

Okay, now we have an index that maps to our array - we just add one to the number held in the array at that point - to begin with, every element has the value of 0, and we add one to that value every time we encounter the corresponding letter:


public string analyseText (String plainText) {
int[] frequencyTable = new int[26];
String text = "";
char letter;
int index;
int i;
for (i = 0; i < plainText.length(); i++) {
letter = plainText.charAt (i);
if (letter == ' ') {
continue;
}

index = (letter - 'a');
frequencyTable[index] = frequencyTable[index] + 1;
}

}

This builds up our frequency table. Consider for example the string abcb.

This string has a length of 4, and each letter has a specific index:

Letter indices in a String
Fig 9.12: Letter indices in a String

We start counting from zero, and get the char at that position:

a

We check to see if the Unicode value of this (97) is the same as the Unicode value of a space (67) - it isn't.

We then subtract the Unicode value of a (97) from the Unicode value of this letter (97) to give the index of the letter in our array - in this case, it is 0.

Our array (or at least, a subset of it) looks like this:

Frequency Analysis Step One
Fig 9.13: Frequency Analysis Step One

We then add one to the element indicated by our index, which is 0:

Frequency Analysis Step Two
Fig 9.14: Frequency Analysis Step Two

Now we repeat the process with the next letter, b (index is 98 - 97 = 1):

Frequency Analysis Step Three
Fig 9.15: Frequency Analysis Step Three

And C (99 - 97):

Frequency Analysis Step Four
Fig 9.16: Frequency Analysis Step Four

And then B again (98 - 97):

Frequency Analysis Step Five
Fig 9.17: Frequency Analysis Step Five

Our array now has the frequency of each letter, which we then need to build a string representation of.

We are going to go over every element in the frequencyTable array and turn the index back into the appropriate letter - we do this by again making use of the Unicode value of 'a':

for (i = 0; i < frequencyTable.length; i++) { 
letter = (char) (i + 'a');
text = text + (letter + " appears " + frequencyTable[i] + " times.\n";
}

And finally, we return the variable text from the method:


public string analyseText (String plainText) {
int[] frequencyTable = new int[26];
String text = "";
char letter;
int index;
int i;
for (i = 0; i < plainText.length(); i++) {
letter = plainText.charAt (i);
if (letter == ' ') {
continue;
}

index = (letter - 'a');
frequencyTable[index] =frequencyTable[index] + 1;
}

for (i = 0; i < frequencyTable.length; i++) {
letter = (char) (i + 'a');
text += letter + " appears " + frequencyTable[i]+ " times.\n";
}

return text;
}

All now all we have left to do is link this up to our actionPerformed method:


public void actionPerformed (ActionEvent e) {
String textToEncrypt;
String encryptedText;
String analysedText;
textToEncrypt = myText.getText();
textToEncrypt = textToEncrypt.toLowerCase();
if (e.getSource() == encrypt) {
encryptedText = encryptText (textToEncrypt);
myText.setText (encryptedText);
}

else if (e.getSource() == decrypt) {
analysedText = analyseText (textToEncrypt);
JOptionPane.showMessageDialog (null, analysedText);
}

}

And that's us finished!

Run this applet

Click here to see the applet SimpleFrequencyAnalysis.class working.


9.7

Conclusion

Now we can encrypt a piece of text according to a random cipher key, and get the frequency analysis of that cipher text in a format suitable for decrypting. The process of decrypting the cipher text is beyond the scope of this module, but the interested student is directed towards any of the fine introductory texts available on the subject of cryptography.

9.8

Reader exercises

Exercise One

Is the output for the frequency analysis actually sensible, in that it gives something that is generalisable to all text? Consider how the frequency analysis could be represented instead as a percentage rather than a single numerical value.

Exercise Two

In the analyseText method there is a call to the special keyword continue which terminates a loop iteration and continues with the next. Is this perhaps a less than ideal approach, and if so why?

Exercise Three

It is possible using nothing more complex than we have discussed in this aside to develop another section of the applet that will make best guesses as to what ciphertext letters represent. Consider how this could be done.

Exercise Four

A random encryption key is all well and good, but what if you want to actually send a coded message to someone? How do they decode it?

Have a read of the Simon Singh website listed below (or even better, read his excellent Code Book, also linked below) and implement a codeword system in your applet. An example of how this could work is shown below.

Run this applet

Click here to see the applet FrequencyAnalysis.class working.


9.9

Reader Projects

Weather Data Analysis

A local scientific outfit has found a need for a piece of software to help interpret scientific data that is being received from a weather satellite orbiting over the earth. The interface to the satellite has already been dealt with, but there is a need for a piece of software to store the daily figures and calculate the averages.

Each day's data is stored as a string containing a series of comma delimited numbers. For example:

10, 15, 11, 9, 8, 11

The application should allow the user to enter a series of these strings. When the user has entered as many of these strings as required, they should be able to press a button that calculates the following information:

  1. The smallest number in each string
  2. The largest number in each string
  3. The average of all numbers stored.
  4. Which day has the largest sum of numbers.

For example, if the user enters three strings of data:

5, 10, 10, 3, 20

10, 12, 30, 10

5, 11

The application should provide the following information:

  1. The smallest number is 3 in string 1, 10 in string 2, and 5 in string 3
  2. The largest number is 20 in string 1, 30 in string 2, and 11 in string 3
  3. The average of the numbers is 11.45
  4. Day 2 has the largest sum of numbers, with a total of 62.

Further Reading

The following table details further reading on the topic in this chapter, and also any external resources that you may find useful.

ResourceDescription
Example Programs from this chapterThis is a zip file of all the programs shown in this chapter.
Simon Singh's WebsiteLearn more about encryption in all its forms.
The Code BookAll of Simon Singh's books are great, and this one is no exception - it's a history of codes and code-breaking, peppered with detailed examples and reader exercises. It's great all over, so if you are interested in the subject you couldn't do better than pick up a copy of this.
Unicode WebsiteWant to know how Unicode works? Check this out.
Frequency AnalysisWikipedia's entry on frequency analysis, for those interested in exploring the topic a little further.

PreviousTable of ContentsNext

© 2004-2006 Michael James Heron