lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Modification of positional information encoding
Date Wed, 15 Oct 2008 09:33:21 GMT

Renaud Delbru wrote:

> Hi Michael,
> Michael McCandless wrote:
>> Also, this issue was just opened:
>> which would make it possible for classes in the same package  
>> (oal.index) to use their own indexing chain.  With that fix, if you  
>> make your own classes in oal.index package, and perhaps subclass  
>> the above classes, you could then create your own indexing chain  
>> for indexing?  If you take that approach, please report back so we  
>> can learn how to improve Lucene for these very advanced  
>> customizations!
> As a first impression, what will be handy in order to customize  
> postings list will be to make an abstract class FreqProxTermsWriter,  
> that separates segment creation and term information serialisation.  
> This class will implement the generic logic for flushing and  
> appending postings, but will delegate to subclasses the way you  
> write doc + freq and prox + payload info.
> A first idea will be to have the following abstract methods:
> - writeMinState : called by appendPostings, and define how to  
> serialise one FreqProxFieldMergeState
> - writeDocFreq : called by writeMinState, and define how to  
> serialise docs and freq
> - writeProx: called by writeMinState and define how to serialise  
> positions and payloads
> I think other parts of the FreqProxTermsWriter can stay generic.  
> What do you think ?

I agree: let's decouple the "codec" (how to write terms/freq/prox)  
from the other mechanics in FreqProxTermsWriter.

I don't think FreqProxFieldMergeState should be visible to that codec,  
though.  That class is used, internally to FreqProxTermsWriter, to  
manage the multiple threads that had accumulated postings data.

I think the codec API could look something like this:


We would then make a codec that matches today's index file format, but  
allow for others (you) to swap in a new codec.  All of this would be  
experimental & private to oal.index for starters.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message