lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: CompressingTermVectors; per-field decompress?
Date Thu, 02 Apr 2015 20:13:45 GMT
On Thu, Apr 2, 2015 at 4:02 PM, david.w.smiley@gmail.com
<david.w.smiley@gmail.com> wrote:

> They are fundamentally per-document, yes, like stored fields — yes.  But I
> don’t see how this fundamental constraint prevents the term vector format
> from returning a light “Fields” instance which loads per-field data on
> demand when asked for.
>
> I understand most of your ideas for a better term vector format below, to
> varying degrees, but again I don’t see these ideas as being blocking factors
> for having field term data be stored together so it could be accessed
> lazily. (don’t fetch fields you don’t need). Maybe you didn’t mean to imply
> they are?  Although I think you did by saying “vectors aren't going to get
> better until the semantics around them improves”.

It is pretty much impossible to fix the underlying layout to be
efficient fieldwise when the way vectors can be structured in the
different documents is heterogeneous (per-doc) and there are so many
crazy things that can happen. If that were fixed, the file or block
header could contain this metadata instead of per-field-per-doc.
blocks could be compressed "fieldwise" across documents, maybe use
preset dictionary for each field, etc.

Currently, everything must be decompressed.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message