lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1426) Next steps towards flexible indexing
Date Mon, 20 Oct 2008 20:07:44 GMT


Michael McCandless commented on LUCENE-1426:

Does the offset imply that there is also a need for random access into each block?
For such blocks PFOR patching might better be avoided.
Even with patching random access is possible, but it is not available yet at LUCENE-1410.

Yeah this is one of the reasons why I'm thinking for frequent terms we
may want to fallback to pure nbit packing (which would make random
access simple).

But, for starters would could simply implement random access as "load
& decode the entire block, then look at the part you want" and then
assess the cost.  While it will clearly increase the cost of queries
that do alot of skipping (eg AND query of N terms), it may not matter
so much since these queries should be fairly fast now.  It's the OR of
frequent term queries that we need to improve since that limits how
big an index you can put on one box.

> Next steps towards flexible indexing
> ------------------------------------
>                 Key: LUCENE-1426
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: LUCENE-1426.patch
> In working on LUCENE-1410 (PFOR compression) I tried to prototype
> switching the postings files to use PFOR instead of vInts for
> encoding.
> But it quickly became difficult.  EG we currently mux the skip data
> into the .frq file, which messes up the int blocks.  We inline
> payloads with positions which would also mess up the int blocks.
> Skipping offsets and TermInfo offsets hardwire the file pointers of
> frq & prox files yet I need to change these to block + offset, etc.
> Separately this thread also started up, on how to customize how Lucene
> stores positional information in the index:
> So I decided to make a bit more progress towards "flexible indexing"
> by first modularizing/isolating the classes that actually write the
> index format.  The idea is to capture the logic of each (terms, freq,
> positions/payloads) into separate interfaces and switch the flushing
> of a new segment as well as writing the segment during merging to use
> the same APIs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message