lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <>
Subject Re: wildcard search with variable length
Date Wed, 22 Feb 2006 15:01:39 GMT
Andrzej Bialecki wrote:
> Tiago Silveira wrote:
>> IMHO, using "cat cat?" or even "cat cat? cat??" is so simple that it 
>> doesn't
>> justify keeping the old, undocumented, arguably incorrect behavior.
> I have a different view on this issue - IMHO treating "?" as "exactly 
> one character" is counterintuitive for people familiar with the use of 
> wildcards: in all popular regular expression languages, and also in 
> DTD/XML world, a single "?" metacharacter means "zero or one", which 
> is probably why the original behavior was introduced (or at least it 
> was more compatible with the use of "?" in other contexts).

There are two distinctly different traditions for ?, *, and +. One is 
globbing (standard in UNIX shells) and the other is regular expression. 
In the case of globbing ? has always stood for a single character, * 
stands for one or more and + is not defined. In regular expression, 
these modify the prior regular expression to mean 0 or 1; 0 or more; and 
1 or more.

Lucene seems to support globbing (trailing) and not regex. To me this is 
clear in the documentation.

That said, a search seems to be a kind of regex and blending these two 
traditions leads to confusion. Though the first time I tried lucene to 
do a search, I used these metacharacters as if they were regex modifiers 
not globbing characters. (Natural behavior of a perl programmer!) It did 
not work as expected. This led me to read the docs and then I understood 
the errors of my ways.

Personally, I don't want an either/or. I want a both/and. Modern unix 
shells provide both/and, albeit with different syntax.

I see this more as a feature request than an argument as to the 
usefulness or properness of either. Both are useful. Both are proper. 
Both are intuitive. Both are counterintuitive. It all depends on your 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message