java.util.regex.Pattern StackOverflowError’s on (x|y)*

I’ve been doing a fair whack of regex processing in Java recently. I must admit that having a regex library in the language is very useful, and Java’s java.util.regex.Pattern implements a very complete regex library (all the “Perl5″ stuff, non-greedy quantifiers, Unicode character classes, etc).

So I was completely surprised when it started throwing StackOverflowError. The following code highlights the problem (reproduced on Sun’s JDK1.5.0-b2 and JDK1.4.2_01). Any pattern like (x|y)* (an alternative wrapped in a star) is implemented recursively by Pattern. Each matched alternative uses about 5 method calls on the stack.

Ai Carumba!

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexpFun {
    public static void main(String[] args) {

        CharSequence largeString = makeLargeString();
        Pattern p = Pattern.compile("(x|y)*");
        Matcher m = p.matcher(largeString);
        System.out.println("matches? " + m.matches()); // throws StackOverflowError
    }

    private static CharSequence makeLargeString() {
        final int size = 1000;
        StringBuffer largeString = new StringBuffer(size);
        for (int i = 0; i < size/2; i++) {
            largeString.append("xy");
        }
        return largeString;
    }
}

Update: I know I could use [xy]* instead of (x|y)*, but my x and y are complicated regular expressions themselves.

  • http://www.blizzy.de Maik

    It looks like Sun is not willing to fix this, see here: http://bugs.sun.com/bugdatabase/viewbug.do?bugid=5050507

  • http://www.jadetower.org/muses/ Will

    I had the exact same problem today. For my purposes, switching to "(x|y)*+" has the right semantics and doesn’t suffer from the dreaded StackOverflowError.

  • Rich

    I’ve been noticing this indirectly from the schema validator today. Sigh.

  • Olimatis

    Using a gun is always risky …
    So we must go easy with the trigger (i.e. recursivity)

    Bang! ;-)

  • abhishek ranjan

    import java.util.regex.Pattern;
    import java.util.regex.Matcher;

    public class RegexpFun {
    public static void main(String[] args) {

        CharSequence largeString = makeLargeString();
        Pattern p = Pattern.compile("(x|y)*");
        Matcher m = p.matcher(largeString);
        System.out.println("matches? " + m.matches()); // throws StackOverflowError
    }
    
    private static CharSequence makeLargeString() {
        final int size = 1000;
        StringBuffer largeString = new StringBuffer(size);
        for (int i = 0; i < size/2; i++) {
            largeString.append("xy");
        }
        return largeString;
    }
    

    }