lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-6255) PhraseQuery inconsistencies
Date Thu, 19 Feb 2015 08:56:11 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-6255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrien Grand updated LUCENE-6255:
---------------------------------
    Attachment: LUCENE-6255.patch

Here is a middle ground proposal:
 - enforce that terms are added in order of positions
 - enforce that positions are all positive
 - PhraseQuery still accepts that the first position is greater than 0 but PhraseWeight does
not
 - PhraseQuery.rewrite takes care of rebasing positions if the first one is not 0

This way, PhraseQuery would still be friendly to query parsers that create phrase queries
from a token stream.

> PhraseQuery inconsistencies
> ---------------------------
>
>                 Key: LUCENE-6255
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6255
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>         Attachments: LUCENE-6255.patch
>
>
> PhraseQuery behaves quite inconsistently when the position of the first term is greater
than 0. Here is an example:
> {noformat}
>     Directory dir = newDirectory();
>     RandomIndexWriter iw = new RandomIndexWriter(random(), dir);
>     FieldType customType = new FieldType(TextField.TYPE_NOT_STORED);
>     customType.setOmitNorms(true);
>     Field f = new Field("body", "", customType);
>     Document doc = new Document();
>     doc.add(f);
>     f.setStringValue("one quick fox");
>     iw.addDocument(doc);
>     IndexReader ir = iw.getReader();
>     iw.close();
>     IndexSearcher is = newSearcher(ir);
>     
>     PhraseQuery pq = new PhraseQuery();
>     pq.add(new Term("body", "quick"), 0);
>     pq.add(new Term("body", "fox"), 1);
>     System.out.println(is.search(pq, 1).totalHits); // 1
>     pq = new PhraseQuery();
>     pq.add(new Term("body", "quick"), 10);
>     pq.add(new Term("body", "fox"), 11);
>     System.out.println(is.search(pq, 1).totalHits); // 0
>     
>     pq = new PhraseQuery();
>     pq.add(new Term("body", "quick"), 10);
>     System.out.println(is.search(pq, 1).totalHits); // 1
>     
>     pq = new PhraseQuery();
>     pq.add(new Term("body", "quick"), 10);
>     pq.add(new Term("body", "fox"), 11);
>     pq.setSlop(1);
>     System.out.println(is.search(pq, 1).totalHits); // 1
>     
>     ir.close();
>     dir.close();
> {noformat}
> The reason is that when you add a term with position P on a PhraseQuery, ExactPhraseScorer
ignores all positions for this term which are less than P.
> But this is inconsistent:
>  - if you have a single term, it does not work anymore since we rewrite to a term query
regardless of the position of the term (3rd query)
>  - if you increase the slop, we will use SloppyPhraseScorer which does not have this
behaviour. (4th query)
> So I think we have two options:
>  - either remove this behaviour and make the positions that are provided to PhraseQuery
only relative (ie. fix ExactPhraseScorer)
>  - or make it work this way across the board (which means not rewriting to a term query
when the position is not 0 and fixing SloppyPhraseScorer).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message