lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: [jira] Commented: (LUCENE-648) Allow changing of ZIP compression level for compressed fields
Date Wed, 16 Aug 2006 12:51:46 GMT

On Aug 16, 2006, at 8:32 AM, Nicolas Lalevée wrote:

> Hi,
> In the issue, you wrote that "This way the indexing level just  
> stores opaque
> binary fields, and then Document handles compress/uncompressing as  
> needed."
> I have looked into the Lucene code, and it seems to me that it is  
> Field that
> should take care of compress/uncompress, and it is the FieldsReader  
> and
> FieldsWriter that should only view binary data.
> Or you mean that compression should be completely external to Lucene ?

I believe the consensus is it should be done externally.

> In fact, from the end of the other thread "Flexible index format /  
> Payloads
> Cont'd", I was discussing about how to cutomize the way data are  
> stored. So I
> have looked deeper in the code and I think I have found a way to do  
> so. And
> as you could change the way is it stored, you also can define the  
> compression
> level, or handle your own compression algorithm. I will show you a  
> patch, but
> I have modified so much code because of my sevral tries, that I  
> need first to
> remove the unecessary changes. To describe it shortly :
> - I have provided a way to provide you own FieldsReader and  
> FieldsWriter (via
> a factory). To create a IndexReader, you have to provide that  
> factory; the
> actual API is just using a default factory.
> - I have moved the code of FieldsReader and FieldsReader that do  
> the field
> data reading to a new class FieldData. The FieldsReader instanciates a
> FieldData, do a, and do a new Field 
> (fielddata,...). The
> FieldsReader do a field.getFieldData().write(output);
> - so extending FieldsReader, you can provide you own implementation of
> FieldData, so you can implement the way you want how data are  
> stored and
> read.
> The tests pass successfully, but I have an issue with that design :  
> one thing
> that is important I think is that in the current design, we can  
> read an index
> in an old format, and just do a writer.addIndexes() into a new  
> format. With
> the new design, you cannot, because the writer will use the  
> FieldData.write
> provided by the reader.
> To be continued...

I would love to see this patch.  I think one could make a pretty good  
argument for this kind of implementation being done "cleanly", that  
is, it shouldn't necessarily involve reworking the internals, but  
instead could represent the foundation for a new, codec based  
indexing mechanism (with an implementation that can read/write the  
existing file format.)

> cheers,
> Nicolas
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244

Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message