My attempt at Kata Eight
# 2003-06-21 17:42:49 -0400 | Java | No CommentsDave Thomas has issued his eighth kata. These are my notes from doing this kata. If you are interested in doing this kata yourself, it’s best if you do it first.
Comments, and criticisms, welcome.
The readable version
The problem seemed pretty straight forward. I look at each word of length 6, and breaking it into as many “halfs” has possible (I’ve called each of these the leading- and trailing-substring). I then check if each substring is actually a word.
To make it readable, I made sure I used descriptive method names, and threw in some comments where I thought it might look tricky.
In my first attempt, the two nested for loops in
scanForConcatenatedWords() used loop variables called
i and j. I stuck with i because
I think it’s a readable name for an iterator. But j made
no sense, so I changed it to leadingSize (the size of the
leading substring). I also factored out the start and end conditions
to their own variable. Giving them a name made the loop clearer.
The ‘fast’ version
So, I had a look at the first version to analyse it for
performance. Let’s say that there are n words in the word list,
and that the longest word has length l (which is just ‘6′ in
our case).
Lets assume that time taken to insert-into/retrieve-from a
HashSet is a constant.
The “readable” readwords() method just loops over the words
once and inserts each into the set, so it is O(n). The
scanForConcatenatedWords() method loops over each word,
with a nested loop for the length of the word, so it is
O(nl). That is our worst-case complexity for the “readable”
version. But since l is a low-valued constant anyway, it
really is just O(n).
Can we improve that? What if we had a different set for each word
length. Then the scanForConcatenatedWords() method
wouldn’t need to loop over every word, just words of length 6. And the
inner loop could just look in the sets for words of size
leadingWord.length() and
trailingWord.length(), instead of looking in the set of
all words.
This version stores words of equal length in their own set. I hoped
this would be faster for iterating over words of length 6; and faster
for checking to see if leadingWord and
trailingWord were real words.
The performance of the ‘fast’ version
I was curious if this version was going to be any faster, because it is still basically O(n), like the readable version.
Well, they both performed exactly the same. The speed improvement
by using many smaller word sets, instead of one large word set, is
negligible. HashSet obviously does a good job of storing
Strings in Java.
(Both completed in under 80ms on Dave’s 45,000 word list.)
The extendible version
Hmm… what is meant by extendible? I tackled this in two ways. One, I made the various parameters configurable. Two, I tried to cater for situations where the user of the code may want to customise the algorithm’s execution.
KataEightExtendible.java
BaseAlg.java
ReadableAlg.java
FastAlg.java
KataEightExtendible has the ‘main’ method; it sets up the
parameters and executes the algorithm.
BaseAlg
contains all the properties that can be used to configure the
algorithm. I’ve extended the problem by allowing words of length other
than 6 to be searched for, and you can search for multiple lengths at once.
BaseAlg also embodies the AbstractAlgorithm
pattern. It needs to be subclassed; each subclass can have its own
policy regarding certain activities.
There are two subclasses,
ReadableAlg and FastAlg,
that are re-writes of the original versions in terms of
BaseAlg.