lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API
Date Thu, 16 Dec 2010 15:45:09 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12972119#action_12972119
] 

Yonik Seeley commented on LUCENE-2723:
--------------------------------------

I tested the optimized index with mike's latest patches (since that's "per segment" on both
branch and trunk).  Things are much more in line now... with the branch being anywhere from
2.3% to 5.4% slower, depending on the exact field tested.

> Speed up Lucene's low level bulk postings read API
> --------------------------------------------------
>
>                 Key: LUCENE-2723
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2723
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.0
>
>         Attachments: LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch,
LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch,
LUCENE-2723_termscorer.patch
>
>
> Spinoff from LUCENE-1410.
> The flex DocsEnum has a simple bulk-read API that reads the next chunk
> of docs/freqs.  But it's a poor fit for intblock codecs like FOR/PFOR
> (from LUCENE-1410).  This is not unlike sucking coffee through those
> tiny plastic coffee stirrers they hand out airplanes that,
> surprisingly, also happen to function as a straw.
> As a result we see no perf gain from using FOR/PFOR.
> I had hacked up a fix for this, described at in my blog post at
> http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html
> I'm opening this issue to get that work to a committable point.
> So... I've worked out a new bulk-read API to address performance
> bottleneck.  It has some big changes over the current bulk-read API:
>   * You can now also bulk-read positions (but not payloads), but, I
>      have yet to cutover positional queries.
>   * The buffer contains doc deltas, not absolute values, for docIDs
>     and positions (freqs are absolute).
>   * Deleted docs are not filtered out.
>   * The doc & freq buffers need not be "aligned".  For fixed intblock
>     codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16,
>     Group varint, etc.) they won't be.
> It's still a work in progress...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message