lucene-dev mailing list archives

From Erik Hatcher <>
Subject Re: Test code for regex queries
Date Fri, 25 Nov 2005 01:26:18 GMT

On 24 Nov 2005, at 11:57, Paul Elschot wrote:
> Capturing groups and special contexts need normal brackets ().

Maybe we have a terminology mismatch.  I call these (parentheses) and  
these [brackets].

> Capturing groups are used for replacements, and I don't see a use
> for that in a query language.


> Special constructs with () brackets are used for non capturing groups,
> match flags, and lookahead/lookbehind.
> Would you know a use for these in a query language?

I meant a construct using something like [Tt]his as a "character  
class" according to the JDK Pattern documentation.

> I also missed things like \u2014, which only add to the problem.

Yeah, I gave up looking for more problems, as there are many.

> There are some older regex implementations in java, but I
> have no idea about the licences and the availabiility.
> Doesn't apache have one somewhere?

Two actually!  ORO and Regexp.  Here's ORO - <http://> (link to Regexp from there)

I'll dig into those soon and see what useful goodies lurk within.

> Btw. $ also has a special meaning in regexes.

Quite true.

For my particular query language, I'm not supporting full regex, just  
*, ?, and [...] syntax.  I convert the expression into regex before  
handing it to RegexQuery (* -> .*, ? -> .?).


