java.util.regex.Pattern StackOverflowError’s on (x|y)*
# 2004-08-27 08:57:53 -0400 | Java | 4 Comments
I’ve been doing a fair whack of regex processing in Java recently.
I must admit that having a regex library in the language is very useful,
and Java’s java.util.regex.Pattern implements a very complete regex
library (all the “Perl5″ stuff, non-greedy quantifiers, Unicode character classes, etc).
So I was completely surprised when it started throwing
StackOverflowError. The following code highlights
the problem (reproduced on Sun’s JDK1.5.0-b2 and JDK1.4.2_01).
Any pattern like (x|y)* (an alternative wrapped in a star)
is implemented recursively by Pattern. Each matched alternative uses
about 5 method calls on the stack.
Ai Carumba!
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexpFun {
public static void main(String[] args) {
CharSequence largeString = makeLargeString();
Pattern p = Pattern.compile(”(x|y)*”);
Matcher m = p.matcher(largeString);
System.out.println(”matches? ” + m.matches()); // throws StackOverflowError
}
private static CharSequence makeLargeString() {
final int size = 1000;
StringBuffer largeString = new StringBuffer(size);
for (int i = 0; i < size/2; i++) {
largeString.append(”xy”);
}
return largeString;
}
}
Update: I know I could use [xy]* instead of (x|y)*, but my
x and y are complicated regular expressions themselves.
It looks like Sun is not willing to fix this, see here: http://bugs.sun.com/bugdatabase/viewbug.do?bugid=5050507
I had the exact same problem today. For my purposes, switching to "(x|y)*+" has the right semantics and doesn’t suffer from the dreaded StackOverflowError.
I’ve been noticing this indirectly from the schema validator today. Sigh.
Using a gun is always risky …
So we must go easy with the trigger (i.e. recursivity)
Bang! ;-)