My attempt at Kata Eight

21 Jun 2003

Dave Thomas has issued his eighth kata. These are my notes from doing this kata. If you are interested in doing this kata yourself, it's best if you do it first.

Comments, and criticisms, welcome.

The readable version

The problem seemed pretty straight forward. I look at each word of length 6, and breaking it into as many "halfs" has possible (I've called each of these the leading- and trailing-substring). I then check if each substring is actually a word.

To make it readable, I made sure I used descriptive method names, and threw in some comments where I thought it might look tricky.

In my first attempt, the two nested for loops in scanForConcatenatedWords() used loop variables called i and j. I stuck with i because I think it's a readable name for an iterator. But j made no sense, so I changed it to leadingSize (the size of the leading substring). I also factored out the start and end conditions to their own variable. Giving them a name made the loop clearer.

The 'fast' version

So, I had a look at the first version to analyse it for performance. Let's say that there are n words in the word list, and that the longest word has length l (which is just '6' in our case). Lets assume that time taken to insert-into/retrieve-from a HashSet is a constant.

The "readable" readwords() method just loops over the words once and inserts each into the set, so it is O(n). The scanForConcatenatedWords() method loops over each word, with a nested loop for the length of the word, so it is O(nl). That is our worst-case complexity for the "readable" version. But since l is a low-valued constant anyway, it really is just O(n).

Can we improve that? What if we had a different set for each word length. Then the scanForConcatenatedWords() method wouldn't need to loop over every word, just words of length 6. And the inner loop could just look in the sets for words of size leadingWord.length() and trailingWord.length(), instead of looking in the set of all words.

This version stores words of equal length in their own set. I hoped this would be faster for iterating over words of length 6; and faster for checking to see if leadingWord and trailingWord were real words.

The performance of the 'fast' version

I was curious if this version was going to be any faster, because it is still basically O(n), like the readable version.

Well, they both performed exactly the same. The speed improvement by using many smaller word sets, instead of one large word set, is negligible. HashSet obviously does a good job of storing Strings in Java.

(Both completed in under 80ms on Dave's 45,000 word list.)

The extendible version

Hmm... what is meant by extendible? I tackled this in two ways. One, I made the various parameters configurable. Two, I tried to cater for situations where the user of the code may want to customise the algorithm's execution.

KataEightExtendible has the 'main' method; it sets up the parameters and executes the algorithm.

BaseAlg contains all the properties that can be used to configure the algorithm. I've extended the problem by allowing words of length other than 6 to be searched for, and you can search for multiple lengths at once.

BaseAlg also embodies the AbstractAlgorithm pattern. It needs to be subclassed; each subclass can have its own policy regarding certain activities.

There are two subclasses, ReadableAlg and FastAlg, that are re-writes of the original versions in terms of BaseAlg.

  • Home
  • Blog