lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: contrib/surround
Date Sat, 04 Jun 2005 20:25:56 GMT
On Monday 30 May 2005 02:44, Erik Hatcher wrote:
> I concur with Daniel on this.  For the moment, my preference is to  
> bring in Paul's parser into contrib/surround and let it gain some  
> additional exposure there.  I don't believe its possible or even  
> preferable to attempt to build one query parser to rule them all.   
> While a decent general purpose one is handy, I'm finding that my  
> projects really demand more custom parsing capabilities than the  
> built-in QueryParser can handle and that the quirks of the current  
> parser cause some frustrations sometimes.
> Perhaps over time, the built-in QueryParser can adopt some additional  
> capabilities such as supporting the SpanQuery family but let's take  
> that sort of thing slowly.

How about extending the surround parser to allow the use of all
queries currently in Lucene? The goal would be to allow as many
queries as possible.

The queries not available in the current surround parser are:
- FuzzyQuery, WildCardQuery, PrefixQuery
- SpanFirstQuery
- SpanNotQuery
- MultiPhraseQuery (or the various phrase scorers),
- optional terms/clauses

FuzzyQuery and SpanFirstQuery could be done with a prefix operator
including a number (like the nn in the nnN near operator) followed by a
single query, with appropriate restrictions.
A prefix operator followed by  a single query is currently not present, but 
relatively easy to add.
SpanNotQuery always has two subqueries, so would need an infix operator
MultiPhraseQuery would need an infix operator and a prefix operator, just
like the N and W operators, and a restriction to terms, truncations and OR
as subqueries.

Left truncation could also be allowed,
truncations currently have to start with a normal character.
Truncation might also be left to WildCardQuery and
PrefixQuery instead of the current "equivalent" in Surround
that uses regular expressions to find the matching terms.

That leaves the optional terms/clauses, and I can't think of an easy way to
handle these. Any ideas? OR does not work for this because it requires
at least one. The normal QueryParser syntax for this is +aa bb cc,
where bb and cc are the optional parts.

Some control over performance is outside the language.
A basic query factory must be provided to the create a Lucene query
from a Surround query, and this throws an exception when
rewriting causes too many terms to be used,
much like the TooManyClauses for BooleanQuery.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message