lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1410) PFOR implementation
Date Sat, 04 Oct 2008 09:33:44 GMT


Michael McCandless commented on LUCENE-1410:

bq. As for decompression speed, please remember that the patching code (that decodes the exceptions
into the output) has not yet been optimized at all.

But this is what I find so weird: for prx, if I fix the encoding at 6 bits, which generates
a zillion exceptions, we are ~31% faster than decoding vInts, and the pfor file size is much
bigger (847 MB vs 621 MB) than vInt.  But if instead I use the bit size as returned by getNumFrameBits(),
which has far fewer exceptions, we are 9.0% slower and file size is a bit smaller than vInt.
 Exception processing seems to be very fast, or, it's the non-unrolled (ForDecompress.decodeAnyFrame)
that's slow which could very well be.

bq. The lucene .frq file contains the docids as deltas and the frequencies but with a special
encoding in case the frequency is one. I'd rather try compressing the real delta docids and
the real frequencies than the encoded versions.

I'll try that.  I bet if we had two separate streams (one for the docIDs and one for the corresponding
freqs) we'd get even better pFor compression.  If we did that "for real" in Lucene that'd
also make it fast to use a normally-indexed field for pure boolean searching (you wouldn't
have to index 2 different fields as you do today to gain the performance at search time).
 In fact, for AND queries on a normal Lucene index this should also result in faster searching
since when searching for the common docIDs you at first don't need the freq data.

Marvin has been doing some performance testing recently and seems to have concluded that keeping
prx and frq as separate files (like Lucene does today but KinoSearch does not) gives better
performance.  Pushing that that same trend further, I think it may very well make sense to
separate docID and frq as well.

bq. A: there is also the influence of the reduction of data to be fetched (via various caches)
from the index. As reported in the articles, PFor strikes a nice optimum between decompression
speed and fetching speed from (fast) disks.

True, but we are seeing just a bit of compression vs Lucene's current encoding.  I think splitting
out frq from docID may show better compression.

>: I was thinking local CPU's native asm.
A: I'd try a C version first. Iirc gcc has a decent optimizer for bit ops, but it's been a
while for me that I used C.

Well... that would just make me depressed (if from Javaland the CPU
level benefits of pFor don't really "survive", but from C-land they
do) ;)  But yes I agree.

bq. For the record, to show the variation in decompression speeds for different numbers of
frame bits without exceptions, here is my current output from TestPFor:

I have similar results (up to 7 bits -- can you post your new

The even-byte sizes (16, 24) have very sizable leads over the others. The 9-bit size is fantastically
slow; it's insane that unrolling it isn't helping.  Seems like we will need to peek at asm
to understand what's going on at the "bare metal" level....

> PFOR implementation
> -------------------
>                 Key: LUCENE-1410
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Other
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: LUCENE-1410b.patch,,
>   Original Estimate: 21840h
>  Remaining Estimate: 21840h
> Implementation of Patched Frame of Reference.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message