lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Elschot (JIRA)" <>
Subject [jira] Commented: (LUCENE-1426) Next steps towards flexible indexing
Date Mon, 20 Oct 2008 18:59:44 GMT


Paul Elschot commented on LUCENE-1426:

bq. We inline payloads with positions which would also mess up the int blocks.

Which begs the question whether we should also allow compression of these payloads.
I think we should do that because normally only one or two bytes will be used as payload per
Thinking about this: position+payload actually looks a lot like docId+freq, could that
be used to simplify future index formats for inverted terms?
Btw. allowing a payload to accompany the field norms would allow to store a kind of
dictionary for the position payloads. This could help to keep the position payloads small
so they would compress nicely.

bq. Both SegmentMerger & FreqProxTermsWriter now use the same codec API to write postings.

That is indeed a big step.

bq. It's all package private.

Good for now, making it public might actually reduce flexibility for new index formats.

> Next steps towards flexible indexing
> ------------------------------------
>                 Key: LUCENE-1426
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: LUCENE-1426.patch
> In working on LUCENE-1410 (PFOR compression) I tried to prototype
> switching the postings files to use PFOR instead of vInts for
> encoding.
> But it quickly became difficult.  EG we currently mux the skip data
> into the .frq file, which messes up the int blocks.  We inline
> payloads with positions which would also mess up the int blocks.
> Skipping offsets and TermInfo offsets hardwire the file pointers of
> frq & prox files yet I need to change these to block + offset, etc.
> Separately this thread also started up, on how to customize how Lucene
> stores positional information in the index:
> So I decided to make a bit more progress towards "flexible indexing"
> by first modularizing/isolating the classes that actually write the
> index format.  The idea is to capture the logic of each (terms, freq,
> positions/payloads) into separate interfaces and switch the flushing
> of a new segment as well as writing the segment during merging to use
> the same APIs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message