lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: DocValues, retrieval performance and policy
Date Mon, 24 Sep 2018 19:07:21 GMT
Toke:

I think part of it is locality. By that I mean two docValues fields in
the same document have no relation to each other in terms of their
location on disk. So _assuming_ all your DocValues can't be contained
in memory, you may be doing a bunch of disk seeks.

This as opposed to just storing the fields which implies one disk
seek/decompression for all fields for a given doc (assuming the 16K
block read/decompressed holds all the fields).

And maybe part of it is the notion of stuffing large text fields into
a DocValues field just to return it seems like abusing DV.

That said, the Streaming code uses DV fields exclusively and I got
200K rows/second returned without tuning a single thing which I doubt
you're going to get with stored fields!

So I think as usual, "it depends".
On Mon, Sep 24, 2018 at 10:25 AM Toke Eskildsen <toes@kb.dk> wrote:
>
> David Smiley <david.w.smiley@gmail.com> wrote:
> > I don't think it makes a difference if some people think docValues should
> > never be used for value-retrieval.  When that performance drop occurred
> > due to those changes, I'm sure it would have affected sorting & faceting
> > as well as value-retrieval. Some more than others perhaps.
>
> Yes. The iterative API is fine for relatively small jumps, so it works perfectly for
sorting on medium- to large result sets. Depending on the type of faceting it's the same.
Grouping and faceting on small result sets is (probably) relatively affected, but as the amount
of needed data is small in those cases, the (assumed) impact is not that high.
>
> Retrieving documents is different as there are typically more fields involved and the
amount of documents itself is nearly always small, which means large jumps repeated for all
the fields.
>
> > I don't see any disagreement about improving docValues in the ways
> > you suggest.
>
> You are right about that. I apologize if I was being unclear: It is not the concrete
patch I am asking about, that's just how this started. I am asking for background on why it
is considered misuse to use Doc Values for document retrieval.
>
> - Toke Eskildsen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message