lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: SpanRegexQuery
Date Fri, 01 Aug 2008 10:24:05 GMT

On Jul 31, 2008, at 10:06 PM, Christopher M Collins wrote:
> I'm trying to use SpanRegexQuery as one of the clauses in my  
> SpanQuery.
> When I give it a regex like: "L[a-z]+ing" and do a rewrite on the  
> final
> query I get terms like "Labinger" and "Lackonsingh" along with the  
> expected
> terms "Labeling", "Lacing", etc.  It's as if the regex is treated as a
> "find()" and not a "match()" in Java.  Is there a way to make it  
> behave
> like a full match, and not a prefix regex?

There are two implementations of the regex engine built into  
SpanRegexQuery, one using Java's java.util.regex, the other using  
Jakarta Regexp.  The default implementation is java.util.regex, which  
matches like this:


And Jakarta Regexp matches like this:


I'm not sure myself the differences in these two without doing some  
tests, but certainly they should, ahem, match in at least the  
expectation of whether there is an implied ^string$ or not.  But at a  
quick glance the respective javadocs, it does seem like the  
java.util.regex implementation should be using  
pattern.matcher(string).matches() instead.  lookingAt() always starts  
at the beginning, so there is an implied ^string effect, but not so  
with the akarta Regexp implementation.

As Daniel mentioned, putting a $ at the end should do the trick, and  
seems to me that it really should be necessary... but so should ^ in  
front if you want it to start at the beginning and not match anywhere  
in the string.

Changing JavaUtilRegexCapabilities to use matches() seems like the  
right thing to do, but that'd break backwards compatibility.  *ugh*


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message