lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Jiang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Date Tue, 07 Aug 2012 16:50:10 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13430423#comment-13430423
] 

Han Jiang commented on LUCENE-3892:
-----------------------------------

Thanks Adrien! Your codes are really clean!

At first glance, I think we should still support all-value-the-same case? For some applications(like
index with payloads), that might be helpful.

And, I'm a little confused about your performance test. Did you use BlockPF before r1370179
as a baseline, and compare it with your latest commit? Here, I tested these two PF under latest
versions(r1370345).

{noformat}
                Task    QPS base StdDev base    QPS comp StdDev comp      Pct diff
         AndHighHigh      124.53        9.36      100.46        3.31  -27% -   -9%
          AndHighLow     2141.08       63.93     1922.73       36.32  -14% -   -5%
          AndHighMed      281.48       36.49      218.68       13.10  -35% -   -5%
              Fuzzy1       84.33        2.56       83.94        1.67   -5% -    4%
              Fuzzy2       30.49        1.13       30.48        0.71   -5% -    6%
          HighPhrase        9.08        0.28        7.56        0.20  -21% -  -11%
    HighSloppyPhrase        5.46        0.21        4.88        0.23  -17% -   -2%
        HighSpanNear       10.12        0.21        9.21        0.30  -13% -   -3%
            HighTerm      176.52        6.13      146.13        5.43  -22% -  -11%
              IntNRQ       59.56        1.98       51.05        1.33  -19% -   -9%
           LowPhrase       40.02        1.03       32.75        0.37  -21% -  -15%
     LowSloppyPhrase       59.59        2.85       51.49        1.33  -19% -   -6%
         LowSpanNear       73.86        3.17       61.98        1.45  -21% -  -10%
             LowTerm     1755.38       15.56     1622.61       26.87   -9% -   -5%
           MedPhrase       25.99        0.47       21.01        0.17  -21% -  -16%
     MedSloppyPhrase       30.52        0.89       24.77        0.55  -22% -  -14%
         MedSpanNear       22.26        0.43       18.73        0.47  -19% -  -12%
             MedTerm      651.90       18.97      573.34       19.25  -17% -   -6%
          OrHighHigh       26.75        0.33       23.53        0.50  -14% -   -9%
           OrHighLow      151.69        2.13      134.17        3.19  -14% -   -8%
           OrHighMed      102.48        1.48       90.73        2.01  -14% -   -8%
            PKLookup      216.59        5.70      215.99        2.99   -4% -    3%
             Prefix3      166.00        0.78      145.25        1.29  -13% -  -11%
             Respell       82.01        3.01       82.80        1.66   -4% -    6%
            Wildcard      151.66        2.22      141.14        1.57   -9% -   -4%
{noformat}

Strange that it isn't working well on my computer. And results are similar when I change MMapDirectory
to NIOFSDirectory.
                
> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3892
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3892
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>
>         Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor&hardcode(base).patch,
LUCENE-3892-blockFor&packedecoder(comp).patch, LUCENE-3892-blockFor-with-packedints-decoder.patch,
LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch,
LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch, LUCENE-3892-handle_open_files.patch,
LUCENE-3892-pfor-compress-iterate-numbits.patch, LUCENE-3892-pfor-compress-slow-estimate.patch,
LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch,
LUCENE-3892_settings.patch
>
>
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> http://blog.mikemccandless.com/2010/08/lucene-performance-with-pfordelta-codec.html
> ).
> I think this would make a good GSoC project.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message