lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: optimal way to access many TermVectors
Date Tue, 08 Oct 2013 07:50:57 GMT
Hi,

On Mon, Oct 7, 2013 at 9:31 PM, Rose, Stuart J <stuart.rose@pnnl.gov> wrote:
> Is there an optimal way to access many document TermVectors (in the same chunk) consecutively
when using the LZ4 termvector compression?
>
> I'm curious to know whether all TermVectors in a single compressed chunk are decompressed
and cached when one TermVector in the same chunk is accessed?

The main use-case for term vectors today being more-like-this and
highlighting, term vectors are generally accessed in no particular
order. This is why we don't cache the uncompressed chunk (it would
never get reused) so you need to decompress everytime you are
retrieving a document or its term vectors.

> Also wondering if there is a mapping of TermVector order to docID order? Or is it always
one to one? If docIds are dynamic, then presumably they are not necessarily in the same order
as their documents' corresponding term vectors...

Term vectors are stored in doc ID order, meaning that for a given
segment, term vectors for document N are followed by term vectors for
document N+1.

-- 
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message