lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl (JIRA) <>
Subject [jira] [Commented] (SOLR-1980) Implement boundary match support
Date Fri, 03 Jun 2011 15:27:47 GMT


Jan Høydahl commented on SOLR-1980:

I'm sure I can get it working the way I started, using CharFilter, however perhaps it's possible
to implement in a more generic and Lucene-like query syntax utilizing position info from the

 title:"quick fox"@N:M
This would mean that the phrase must be anchored between N'th and M'th token position in the
field. Negative values for N/M would mean relative to the end. Thus "^quick fox$" could be
 title:"quick fox"@0:-0
Or if you require the phrase to be within first 10 words OR last 10 words:
 title:("quick fox"@0:10 OR "quick fox"@-10:-0)
Requiring a term to be exactly @ position 3 would be:

If this syntax is feasible, we could use same syntax in eDisMax's pf param in order to tell
it to add a position constraint when forming the pf part of the query:
This would only generate a phrase match on title if the phrase is an exact match of the whole

Potential issues with multi-valued fields? Is the field delimiter clearly marked or is it
only an increment gap?

Would it be easy to parse such a syntax and generate a Lucene query with the position constraints?

> Implement boundary match support
> --------------------------------
>                 Key: SOLR-1980
>                 URL:
>             Project: Solr
>          Issue Type: New Feature
>          Components: Schema and Analysis
>            Reporter: Jan Høydahl
> Sometimes you need to specify that a query should match only at the start or end of a
field, or be an exact match.
> Example content:
> 1) a quick fox is brown
> 2) quick fox is brown
> Example queries:
> "^quick fox" -> should only match 2)
> "brown$" -> should match 1) and 2)
> "^quick fox is brown$" -> should only match 2)
> Proposed way of implmementation is through a new BoundaryMatchTokenFilter which behaves
like this:
> On the index side it inserts special unique tokens at beginning and end of field. These
could be some weird unicode sequence.
> On the query side, it looks for the first character matching "^" or the last character
matching "$" and replaces them with the special tokens.

This message is automatically generated by JIRA.
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message