lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Suggested number of fields limit per Index
Date Thu, 03 Jan 2008 18:40:27 GMT
My suggestion would be:
An "all" field that captures all your attributes and allows for  
generic, easy search across all products.  Additionally, go ahead and  
index all your fields per documents.  Then, for your default search,  
use the all field.  _IF_ you know what category of products you are in  
(i.e. TVs) then you could search against those fields that you know  
are on TVs.  This way, you have a set of fields per product type and  
you make sure that all instances of that product type have those fields.

There really isn't a need for separate indices in this case, I don't  
think.   The tradeoff with the "all" approach is some of your stats  
may be skewed, but it probably isn't provable or noticeable for this  
kind of thing.

On Jan 3, 2008, at 1:18 PM, Dai, Chunhe wrote:

> Thank all of your guys that made suggestions. I greatly appreciate  
> them.
> Our issue is that, our data have the notion of family, for example, a
> Product family could contains products like TV, Car, DVD, etc. Of
> course, each individual set of the product would have its own set of
> definition - which contains the finite number of attributes that
> describe each of the actual product like TV, or Car. For example TV
> would have size, make, weight; Car might have year of made, number of
> doors etc. and of course, all of them have SKU, price as common
> attributes.
> When we set up the index, I originally thought a good idea for setting
> up the index would be on the definition - which means, I would set up
> one index for TV, another one for Car, and a third one for DVD and so
> on. When the idea was presented, people are asking whether it is
> possible to put all the product in one indexes called Product and
> whether it would cause any problem. They basically want to be able to
> search for one common attribute in the index and bring back TV, Car,  
> at the same time and that is the question got started and I needed to
> find out whether this one index per family approach would be causing
> trouble down the line.
> Thanks again for your help.
> -Chunhe
> -----Original Message-----
> From: Grant Ingersoll []
> Sent: Thursday, January 03, 2008 1:03 PM
> To:
> Subject: Re: Suggested number of fields limit per Index
> Another issues is how to generate queries.  If you have hundreds of
> fields, you may have to generate queries (e.g. using the
> MultfieldQueryParser) across all those fields just to find documents
> that _could_ have those fields.  This can lead to the dreaded
> TooManyClausesException.
> That being said, Lucene can handle that many fields; I doubt, though,
> that many would consider it a best practice and I don't think there
> would be any indexing performance issues.  Number of fields can be a
> search issue, but I don't know what your requirements are to say for
> sure.
> I would say that if you have alternative approaches that you think  
> will
> work for your other requirements, and use less fields, then give  
> that a
> try.  I don't know if I would go so far as say all fields should be in
> common, but that is a good thing to approach, as it makes things  
> easier.
> Are you sure you can't just map your fields into a common set?   
> Perhaps
> if you described the problem a bit more, we can help.
> -Grant
> On Jan 3, 2008, at 11:45 AM, Dai, Chunhe wrote:
>> I have been searching online could not find an exact answer; and
>> wondering if anyone here knows whether there is a preferred max  
>> number
>> of fields limit in lucene index?
>> We are in the process of deciding how our index would look like in  
>> our
>> lucene integration. For one of our approach, we could have a large
>> number of fields in the index - say maybe several hundred. But, each
>> Document in the index do not contain every of those fields and would
>> only have a few fields within those hundreds of fields (Probably in
>> tens). Does anyone ever have experience with set up like this? I am
>> wondering whether there is a potential performance issue with  
>> indexing
>> and searching.
>> Thanks.
>> Chunhe
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:
> --------------------------
> Grant Ingersoll
> Lucene Helpful Hints:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll

Lucene Helpful Hints:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message