lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatu Saloranta <>
Subject Re: Escaping bug \( and ? or *
Date Sat, 08 Feb 2003 16:38:18 GMT
On Saturday 08 February 2003 02:14, Lukas Zapletal wrote:
> > Tatu Saloranta wrote:
> >> I think the problem is that the analyzer you used for indexer strips
> >> out parenthesis. So, text actually indexed would look something like:
> >> "test 1 test 2" (assuming 'and' is a stop word removed). Thus there's
> >> no token matching term "(1)" or "(2)".
> >> Same goes for most other punctuation characters, they are routinely
> >> stripped by analyser, as they usually are not very useful for searching.
> >>
> >> To make it work the way you want, you need to modify analyzer to
> >> included parentesis, perhaps so that they are included only if
> >> they contain just single alpha-numeric token (otherwise
> >> "(1 and 2)" would be tokenized to "(1" and "2)" which is probably
> >> not what you want?
> Well this doesn`t work. Check the bugzilla for the example: ESCAPING BUG
> \(abc\) and \(a*c\) in v1.2
> Can anyone help me with it?

Hope I'm not wrong this time, but wasn't it so that prefix/wildcard query 
terms do not currently go through an analyzer? So searching for
\(abc\) would still search for "abc" (analyzer is run after query tokenizer
parses main query structure, getting term "(abc)", then tokenizer removes 
parentheses), but searching for \(a*c\) would actually search for
(a*c). And indexer likely hasn't included parentheses in indexed content?

If there was a way to define an analyzer for QueryParser to use for prefix 
queries this could be solved. This analyzer would need to be specialized 
however, to account for * and ? characters, since they are not to be removed 
(which is normally what should be done)

-+ Tatu +-

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message