lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject RE: Ideal Index Fragmentation
Date Fri, 02 Sep 2005 17:07:50 GMT

: --OK, is there a preferred strategy for generating lists of distinct
: attributes in the hit[]?  I've seen Hoss' post about using QueryFilters,
: but that assumes that you know what values you want to count; but I
: won't know the domain of values to expect in every field...  Can I get
: creative with the hitsCollector to solve this one?

I've never done this myself, but my advice is to generate the list of
values using a TermEnum when you first open the IndexReader -- as long as
the index doesn't hange, this list can be cached, and will be helpfull for
any search that comes in -- just understand that many counts will be zero.

if you have a *really* large number of terms, then caching a BitSet for
each may not be practical, but at a minimum you should be able to cache an
int for each term containing the total number of documents that contain it
-- and for the top N terms, also store the BitSet.  then at search time,
first check hte top N, and if you still don't have enough options to
display to the user, then generate BitSets for the rest (in order of how
common the term is in the overall index).


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message