lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (LUCENE-5084) EliasFanoDocIdSet
Date Tue, 02 Jul 2013 12:36:38 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13697601#comment-13697601
] 

Paul Elschot edited comment on LUCENE-5084 at 7/2/13 12:36 PM:
---------------------------------------------------------------

bq.  ... what is the CPU tradeoff for all these compressions and how does query speed compare
across all of them?

We don't really know.
One conclusion from the Vigna paper is that a tuned implementation of an Elias-Fano decoder
is faster than a tuned PForDelta implementation for highly selective phrase queries.
I would guess that that is because Elias-Fano uses random access to the low bits where PForDelta
only uses bulk decompression of the low bits, which compensates for  Elias-Fano decoding its
high bits somewhat slower than PForDelta can decode its exceptions.

One reason to use Elias-Fano for a DocIdSet here is that its high bits are encoded in unary
coding which can easily be decoded in two directions, and that makes it useful for block joins.
The other reason is that its compression is quite good, which makes it a nice candidate for
in memory filters.
                
      was (Author: paul.elschot@xs4all.nl):
    bq.  ... what is the CPU tradeoff for all these compressions and how does query speed
compare across all of them?

We don't really know.
One can conclusion from the Vigna paper is that a tuned implementation of an Elias-Fano decoder
is faster than a tuned PForDelta implementation for highly selective phrase queries.
I would guess that that is because Elias-Fano uses random access to the low bits where PForDelta
only uses bulk decompression of the low bits, which compensates for  Elias-Fano decoding its
high bits somewhat slower than PForDelta can decode its exceptions.

One reason to use Elias-Fano for a DocIdSet here is that its high bits are encoded in unary
coding which can easily be decoded in two directions, and that makes it useful for block joins.
The other reason is that its compression is quite good, which makes it a nice candidate for
in memory filters.
                  
> EliasFanoDocIdSet
> -----------------
>
>                 Key: LUCENE-5084
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5084
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Paul Elschot
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-5084.patch
>
>
> DocIdSet in Elias-Fano encoding

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message