lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charles Lloyd <>
Subject Shared Field Values
Date Sun, 13 Nov 2005 00:56:35 GMT
Would the following be a reasonable feature to add to Lucene?

We use Lucene for a catalog with about 3 million items, each document represents one item.
 Some Fields are highly redundant, such as "Manufacturer Name"; we only have a few hundred
different manufacturers.  I would like to be able declare this Field to be Indexed, Stored
and 'Shared' so that there's only one copy of each Name stored.  This would save about 45%
of the space in our catalog.

I considered using a code for each Name, but then we can't search on the Name.  So I considered
using an pair of Unstored and Stored fields with a code, but this becomes unwieldy since we
have many fields for which this could be done and this breaks alot of existing code.  I considered
several other things as well, but think the best solution is a "Shared" Field type.

I am new to Lucene dev so I would appreciate it if someone could outline how to approach this.
 It looks like FieldsWriter.addDocument(...) is a good place to make the substitution, but
I've got no idea where to store the actual values.  We'd need a segment of char[] data stored
somewhere that could be accessed later when FieldsReader.doc(...) is called.  Each shared
Field would need to write out the offset and length rather than the value itself.

What would be the best way to store the shared data?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message