lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: scalability w/ number of fields
Date Wed, 06 Apr 2005 16:28:18 GMT
Yonik Seeley wrote:
> They are all indexed (and they all need to be under the current design).

As I mentioned before, Lucene will not perform well with a large number 
of indexed fields.  If these are not tokenized fields, then a simple way 
to reduce the number of indexed fields is to move the field name into 
the value.  Instead of adding <fieldX, valueY> and <fieldZ, valueA>, add 
<generic, fieldX-valueY> and <generic, fieldZ-valueY>.  This should 
perform quite well.  You'll also need to manipulate queries accordingly.

A similar method can work for tokenized fields.  Simply write a 
TokenFilter that appends a field name to the front of tokens.

Yes, this is an ugly hack, but it can make a huge performance 
differrence.  The problem is that Lucene stores norm values in an array, 
when, in cases like yours, a sparse data structure might be more sensible.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message