lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6255) PhraseQuery inconsistencies
Date Thu, 19 Feb 2015 01:51:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326876#comment-14326876
] 

Robert Muir commented on LUCENE-6255:
-------------------------------------

Can we avoid throwing an exception to the user?

I don't think its their fault if they type "the query", and the search engine has a stopword
filter in the chain. It will confuse them, they dont get an error with "query the". 
I mean, its still possible to throw it if we really want from the query side, but it just
makes queryparsers more complicated, because any sane parser will want to avoid this explicitly.
i really don't think its the right response, and I think its rare enough that people will
see that response as a bug.


> PhraseQuery inconsistencies
> ---------------------------
>
>                 Key: LUCENE-6255
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6255
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>
> PhraseQuery behaves quite inconsistently when the position of the first term is greater
than 0. Here is an example:
> {noformat}
>     Directory dir = newDirectory();
>     RandomIndexWriter iw = new RandomIndexWriter(random(), dir);
>     FieldType customType = new FieldType(TextField.TYPE_NOT_STORED);
>     customType.setOmitNorms(true);
>     Field f = new Field("body", "", customType);
>     Document doc = new Document();
>     doc.add(f);
>     f.setStringValue("one quick fox");
>     iw.addDocument(doc);
>     IndexReader ir = iw.getReader();
>     iw.close();
>     IndexSearcher is = newSearcher(ir);
>     
>     PhraseQuery pq = new PhraseQuery();
>     pq.add(new Term("body", "quick"), 0);
>     pq.add(new Term("body", "fox"), 1);
>     System.out.println(is.search(pq, 1).totalHits); // 1
>     pq = new PhraseQuery();
>     pq.add(new Term("body", "quick"), 10);
>     pq.add(new Term("body", "fox"), 11);
>     System.out.println(is.search(pq, 1).totalHits); // 0
>     
>     pq = new PhraseQuery();
>     pq.add(new Term("body", "quick"), 10);
>     System.out.println(is.search(pq, 1).totalHits); // 1
>     
>     pq = new PhraseQuery();
>     pq.add(new Term("body", "quick"), 10);
>     pq.add(new Term("body", "fox"), 11);
>     pq.setSlop(1);
>     System.out.println(is.search(pq, 1).totalHits); // 1
>     
>     ir.close();
>     dir.close();
> {noformat}
> The reason is that when you add a term with position P on a PhraseQuery, ExactPhraseScorer
ignores all positions for this term which are less than P.
> But this is inconsistent:
>  - if you have a single term, it does not work anymore since we rewrite to a term query
regardless of the position of the term (3rd query)
>  - if you increase the slop, we will use SloppyPhraseScorer which does not have this
behaviour. (4th query)
> So I think we have two options:
>  - either remove this behaviour and make the positions that are provided to PhraseQuery
only relative (ie. fix ExactPhraseScorer)
>  - or make it work this way across the board (which means not rewriting to a term query
when the position is not 0 and fixing SloppyPhraseScorer).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message