lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Serebrennikov <>
Subject Re: Term vectors: .tvf format question
Date Mon, 14 Jun 2004 18:49:18 GMT
Doug Cutting wrote:

> So term-number-based vectors would be small and fast to use if all 
> you're using is a single, optimized index, but very slow to use with 
> unoptimized indexes and multiple indexes.  That seems like a bad 
> situtation, so, unless someone figures out another way, we're stuck 
> with the current approach.  Vectors are bigger and slower than 
> optimal, but they're consistently so. 

I'm very familiar with this particular issue :). One solution that has 
worked for my application was to treat terms from different segments / 
indexes as always being different, even if they actually did have the 
same text. Later on in results processing, when the number of terms 
under consideration has been greatly reduced, I was able to do the 
lookups and further consolidate those terms that turned out to be 
identical. Not sure if this is a good general solution, but it has 
worked for me reasonable well.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message