I’ve been doing a fair whack of regex processing in Java recently.
I must admit that having a regex library in the language is very useful,
and Java’s java.util.regex.Pattern
implements a very complete regex
library (all the “Perl5” stuff, non-greedy quantifiers, Unicode character classes, etc).
So I was completely surprised when it started throwing
StackOverflowError
. The following code highlights
the problem (reproduced on Sun’s JDK1.5.0-b2 and JDK1.4.2_01).
Any pattern like (x|y)*
(an alternative wrapped in a star)
is implemented recursively by Pattern. Each matched alternative uses
about 5 method calls on the stack.
Ai Carumba!
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexpFun {
public static void main(String[] args) {
CharSequence largeString = makeLargeString();
Pattern p = Pattern.compile("(x|y)*");
Matcher m = p.matcher(largeString);
System.out.println("matches? " + m.matches()); // throws StackOverflowError
}
private static CharSequence makeLargeString() {
final int size = 1000;
StringBuffer largeString = new StringBuffer(size);
for (int i = 0; i < size/2; i++) {
largeString.append("xy");
}
return largeString;
}
}
Update: I know I could use [xy]*
instead of (x|y)*
, but my
x
and y
are complicated regular expressions themselves.