lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-6919) Change the Scorer API to expose an iterator instead of extending DocIdSetIterator
Date Fri, 04 Dec 2015 18:13:10 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Adrien Grand updated LUCENE-6919:
---------------------------------
    Attachment: LUCENE-6919.patch

Here is the (hacky) patch that I used for the benchmark.

This would be a fairly large change, so I'd like to get feedback before trying to actually
do it. If you don't like this new API, please let me know.

> Change the Scorer API to expose an iterator instead of extending DocIdSetIterator
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-6919
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6919
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-6919.patch
>
>
> I was working on trying to address the performance regression on LUCENE-6815 but this
is hard to do without introducing specialization of DisjunctionScorer which I'd like to avoid
at all costs.
> I think the performance regression would be easy to address without specialization if
Scorers were changed to return an iterator instead of extending DocIdSetIterator. So conceptually
the API would move from
> {code}
> class Scorer extends DocIdSetIterator {
> }
> {code}
> to
> {code}
> class Scorer {
>   DocIdSetIterator iterator();
> }
> {code}
> This would help me because then if none of the sub clauses support two-phase iteration,
DisjunctionScorer could directly return the approximation as an iterator instead of having
to check if twoPhase == null at every iteration.
> Such an approach could also help remove some method calls. For instance TermScorer.nextDoc
calls PostingsEnum.nextDoc but with this change TermScorer.iterator() could return the PostingsEnum
and TermScorer would not even appear in stack traces when scoring. I hacked a patch to see
how much that would help and luceneutil seems to like the change:
> {noformat}
>                     TaskQPS baseline      StdDev   QPS patch      StdDev            
   Pct diff
>                   Fuzzy1       88.54     (15.7%)       86.73     (16.6%)   -2.0% ( -29%
-   35%)
>               AndHighLow      698.98      (4.1%)      691.11      (5.1%)   -1.1% (  -9%
-    8%)
>                   Fuzzy2       26.47     (11.2%)       26.28     (10.3%)   -0.7% ( -19%
-   23%)
>              MedSpanNear      141.03      (3.3%)      140.51      (3.2%)   -0.4% (  -6%
-    6%)
>               HighPhrase       60.66      (2.6%)       60.48      (3.3%)   -0.3% (  -5%
-    5%)
>              LowSpanNear       29.25      (2.4%)       29.21      (2.1%)   -0.1% (  -4%
-    4%)
>                MedPhrase       28.32      (1.9%)       28.28      (2.0%)   -0.1% (  -3%
-    3%)
>                LowPhrase       17.31      (2.1%)       17.29      (2.6%)   -0.1% (  -4%
-    4%)
>         HighSloppyPhrase       10.93      (6.0%)       10.92      (6.0%)   -0.1% ( -11%
-   12%)
>          MedSloppyPhrase       72.21      (2.2%)       72.27      (1.8%)    0.1% (  -3%
-    4%)
>                  Respell       57.35      (3.2%)       57.41      (3.4%)    0.1% (  -6%
-    6%)
>             HighSpanNear       26.71      (3.0%)       26.75      (2.5%)    0.1% (  -5%
-    5%)
>             OrNotHighLow      803.46      (3.4%)      807.03      (4.2%)    0.4% (  -6%
-    8%)
>          LowSloppyPhrase       88.02      (3.4%)       88.77      (2.5%)    0.8% (  -4%
-    7%)
>             OrNotHighMed      200.45      (2.7%)      203.83      (2.5%)    1.7% (  -3%
-    7%)
>               OrHighHigh       38.98      (7.9%)       40.30      (6.6%)    3.4% ( -10%
-   19%)
>                 HighTerm       92.53      (5.3%)       95.94      (5.8%)    3.7% (  -7%
-   15%)
>                OrHighMed       53.80      (7.7%)       55.79      (6.6%)    3.7% (  -9%
-   19%)
>               AndHighMed      266.69      (1.7%)      277.15      (2.5%)    3.9% (  
0% -    8%)
>                  Prefix3       44.68      (5.4%)       46.60      (7.0%)    4.3% (  -7%
-   17%)
>                  MedTerm      261.52      (4.9%)      273.52      (5.4%)    4.6% (  -5%
-   15%)
>                 Wildcard       42.39      (6.1%)       44.35      (7.8%)    4.6% (  -8%
-   19%)
>                   IntNRQ       10.46      (7.0%)       10.99      (9.5%)    5.0% ( -10%
-   23%)
>            OrNotHighHigh       67.15      (4.6%)       70.65      (4.5%)    5.2% (  -3%
-   15%)
>            OrHighNotHigh       43.07      (5.1%)       45.36      (5.4%)    5.3% (  -4%
-   16%)
>                OrHighLow       64.19      (6.4%)       67.72      (5.5%)    5.5% (  -6%
-   18%)
>              AndHighHigh       64.17      (2.3%)       67.87      (2.1%)    5.8% (  
1% -   10%)
>                  LowTerm      642.94     (10.9%)      681.48      (8.5%)    6.0% ( -12%
-   28%)
>             OrHighNotMed       12.68      (6.9%)       13.51      (6.6%)    6.5% (  -6%
-   21%)
>             OrHighNotLow       54.69      (6.8%)       58.25      (7.0%)    6.5% (  -6%
-   21%)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message