java.util.regex.Pattern StackOverflowError’s on (x|y)*

I’ve been doing a fair whack of regex processing in Java recently. I must admit that having a regex library in the language is very useful, and Java’s java.util.regex.Pattern implements a very complete regex library (all the “Perl5″ stuff, non-greedy quantifiers, Unicode character classes, etc).

So I was completely surprised when it started throwing StackOverflowError. The following code highlights the problem (reproduced on Sun’s JDK1.5.0-b2 and JDK1.4.2_01). Any pattern like (x|y)* (an alternative wrapped in a star) is implemented recursively by Pattern. Each matched alternative uses about 5 method calls on the stack.

Ai Carumba!

import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexpFun {
    public static void main(String[] args) {

        CharSequence largeString = makeLargeString();
        Pattern p = Pattern.compile("(x|y)*");
        Matcher m = p.matcher(largeString);
        System.out.println("matches? " + m.matches()); // throws StackOverflowError
    }

    private static CharSequence makeLargeString() {
        final int size = 1000;
        StringBuffer largeString = new StringBuffer(size);
        for (int i = 0; i < size/2; i++) {
            largeString.append("xy");
        }
        return largeString;
    }
}

Update: I know I could use [xy]* instead of (x|y)*, but my x and y are complicated regular expressions themselves.

4 Comments

  1. Posted August 27, 2004 at 8:53 am | Permalink

    It looks like Sun is not willing to fix this, see here: http://bugs.sun.com/bugdatabase/viewbug.do?bugid=5050507

  2. Posted August 31, 2004 at 1:03 am | Permalink

    I had the exact same problem today. For my purposes, switching to "(x|y)*+" has the right semantics and doesn’t suffer from the dreaded StackOverflowError.

  3. Rich
    Posted October 19, 2004 at 3:55 pm | Permalink

    I’ve been noticing this indirectly from the schema validator today. Sigh.

  4. Olimatis
    Posted November 19, 2004 at 3:11 am | Permalink

    Using a gun is always risky …
    So we must go easy with the trigger (i.e. recursivity)

    Bang! ;-)

Post a Comment

Your email is never shared. Required fields are marked *

*
*