lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3892) Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
Date Wed, 08 Aug 2012 16:10:22 GMT


Robert Muir commented on LUCENE-3892:

So ... most of the gains come from BlockPF cutover. This is sort of
... surprising/disappointing, ie, our bottlenecks are the abstraction
layers, not the actual decode cost. Still it's good to make progress
on removing the abstractions.

I don't think its that disappointing. This isnt a very interesting
benchmark for a compression algorithm like FOR: instead imagine the
very common case of apps today indexing small fields like product names,
restaurant names, or something like that. Freqs are nearly always 1,
and positions are tiny, but often people still want the ability to
use things like phrase queries. And imagine cases where people
are indexing data from a database and there are only a few unique
values (e.g. product type = tshirt, pants, shoes) in a field. 

I think the wikipedia benchmark doesn't do a very good job of illustrating 
performance on use-cases like this, which I think are common and also
where I'm fairly positive FOR will be a win. 

Its nice that its not slower or too much bigger in the "worst case"
of large docs where the numbers aren't so tiny?

Also, it looks like the only query that is slower than Lucene40 is
AndHighLow ... however, it's also an extremely fast query to begin
with so I think it's a fine tradeoff that it gets slower while the
hard/slower queries get faster.

+1, lets not even think twice about that one.

> Add a useful intblock postings format (eg, FOR, PFOR, PFORDelta, Simple9/16/64, etc.)
> -------------------------------------------------------------------------------------
>                 Key: LUCENE-3892
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>              Labels: gsoc2012, lucene-gsoc-12
>             Fix For: 4.1
>         Attachments: LUCENE-3892-BlockTermScorer.patch, LUCENE-3892-blockFor&hardcode(base).patch,
LUCENE-3892-blockFor&packedecoder(comp).patch, LUCENE-3892-blockFor-with-packedints-decoder.patch,
LUCENE-3892-blockFor-with-packedints-decoder.patch, LUCENE-3892-blockFor-with-packedints.patch,
LUCENE-3892-bulkVInt.patch, LUCENE-3892-direct-IntBuffer.patch, LUCENE-3892-for&pfor-with-javadoc.patch,
LUCENE-3892-handle_open_files.patch, LUCENE-3892-pfor-compress-iterate-numbits.patch, LUCENE-3892-pfor-compress-slow-estimate.patch,
LUCENE-3892_for_byte[].patch, LUCENE-3892_for_int[].patch, LUCENE-3892_for_unfold_method.patch,
LUCENE-3892_pfor_unfold_method.patch, LUCENE-3892_pulsing_support.patch, LUCENE-3892_settings.patch,
> On the flex branch we explored a number of possible intblock
> encodings, but for whatever reason never brought them to completion.
> There are still a number of issues opened with patches in different
> states.
> Initial results (based on prototype) were excellent (see
> ).
> I think this would make a good GSoC project.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message