lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmet Arslan (JIRA)" <>
Subject [jira] [Commented] (LUCENE-1486) Wildcards, ORs etc inside Phrase queries
Date Sun, 16 Mar 2014 17:31:43 GMT


Ahmet Arslan commented on LUCENE-1486:

bq. What about the stopwords bit? yet another JIRA?
There is no patch/solution for that in ComplexPhraseQueryParser.  Tim says about the topic

bq. The root of this problem is that SpanNearQuery has no good way to handle stopwords in
a way analagous to PhraseQuery.

I suggested [~nikhil500] to use a modified StopwordFilter ( I sent the filter to him offlist)
that does not remove but instead reduces given stop words to an impossible token. 
"the" => "ImpossibleToken"
"a" => "ImpossibleToken"
"for" => "ImpossibleToken"

I think we don't need a jira for this functionality but we can document this as limitation
and workaround for this.

> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>                 Key: LUCENE-1486
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/queryparser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Assignee: Erick Erickson
>            Priority: Minor
>             Fix For: 4.8
>         Attachments:, LUCENE-1486.patch, LUCENE-1486.patch,
LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch,
Lucene-1486 non default field.patch,, junit_complex_phrase_qp_07_21_2009.patch,
> An extension to the default QueryParser that overrides the parsing of PhraseQueries to
allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in QueryParser
itself. This works as a proof of concept  for much of the query parser syntax. Examples from
the Junit test include:
> 		checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies are OK in phrases
> 		checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic works
> 		checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic works.
> 		checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a phrase is bad
> 		checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases is bad
> 		checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries inside phrases not supported
> Code plus Junit test to follow...

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message