lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: questions regrading stored fields role in query time
Date Tue, 26 Feb 2019 17:09:56 GMT
It Depends (tm).

See: SOLR-12598 for details. The short form is that as of Solr 7.5, Solr attempts to do the
most efficient thing possible when fetching fields to return to the client.

1> if all requested fields are docValues, return from docValues.
2> if _any_ field is stored, return from the stored (fdt) values.
3> if some are DV=true, but stored=false, get from both places
4> if some are DV=false but stored=true, get from both places.

To return a single stored=true field that is _not_ docValues, a minimum 16K block must be
read from disk and decompressed. Much of the time, that will contain all of the fields and
the uncompressed doc will be in the JVM’s heap so it’s more efficient to do that than
pull it from MMapDirectory space.

If all values are dv=true, then not having to seek to disk/uncompress is probably more efficient
so do it that way.

3 and 4 are really the same thing, you _can’t_ get all the fields from the same place, so
you have to read/decompress _and_ pull from DV.

But wrapped around all this is that you’re really not doing either for even a small fraction
of the docs compared to searching. Say I have numFound of 1,000,000 but return 10 docs. You
only have to decompress 10 blocks at worst.

And, as Emir says, accessing the fdt files is only done for the 10 docs returned, so that
really doesn’t impact the search times much…

Best,
Erick

> On Feb 26, 2019, at 2:40 AM, Emir Arnautović <emir.arnautovic@sematext.com> wrote:
> 
> Hi Saurabh,
> DocValues can be used for retrieving field values (note that order will not be preserved
in case of multivalue field) but they are also stored in files, just different structures.
Doc values will load some structure in memory, but will also use memory mapped files to access
values (not familiar with this code and just assuming) so in any case it will use “shared”
OS caches. Those caches will be affected when loading stored fields to do partial update.
Also it’ll take some memory when indexing documents. That is why storing and doing partial
updates could indirectly affect query performances. But that might be insignificant and only
test can tell for sure. Unless you have small index and enough RAM, then I can also tell that
for sure.
> 
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 26 Feb 2019, at 11:21, Saurabh Sharma <saurabh.infoedge@gmail.com> wrote:
>> 
>> Hi Emir,
>> 
>> I had this question in my mind if I store my only returnable field as
>> docValue in RAM.will my stored documents be referenced while constructing
>> the response after the query. Ideally, as the field asked to return i.e fl
>> is already in RAM then documents on disk should not be consulted for this
>> field.
>> 
>> Any insight about the usage of docValued field vs stored field and
>> preference order will help here in understanding the situation in a better
>> way.
>> 
>> Thanks
>> Saurabh
>> 
>> On Tue, Feb 26, 2019 at 2:41 PM Emir Arnautović <
>> emir.arnautovic@sematext.com> wrote:
>> 
>>> Hi Saurabh,
>>> Welcome to the channel!
>>> Storing fields should not affect query performances directly if you use
>>> lazy field loading and it is the default set. And it should not affect at
>>> all if you have enough RAM compared to index size. Otherwise OS caches
>>> might be affected by stored fields. The best way to tell is to tests with
>>> expected indexing/partial updates load and see if/how much it affects
>>> performances.
>>> 
>>> HTH,
>>> Emir
>>> --
>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>> 
>>> 
>>> 
>>>> On 26 Feb 2019, at 09:34, Saurabh Sharma <saurabh.infoedge@gmail.com>
>>> wrote:
>>>> 
>>>> Hi All ,
>>>> 
>>>> 
>>>> I am new here on this channel.
>>>> Few days back we upgraded our solr cloud to version 7.3 and doing
>>> real-time
>>>> document posting with 15 seconds soft commit and 2 minutes hard commit
>>>> time.As of now we posting full document to solr which includes data
>>>> accumulations from various sources.
>>>> 
>>>> Now we want to do partial updates.I went through the documentation and
>>>> found that all the fields should be stored or docValues for partial
>>>> updates. I have few questions regarding this?
>>>> 
>>>> 1) In case i am just fetching only 1 field while making query.What will
>>> the
>>>> performance impact due to all fields being stored? Lets say i have an
>>> "id"
>>>> field and i do have doc value true for the field, will solr use stored
>>>> fields in this case? will it load whole document in RAM ?
>>>> 
>>>> 2)What's the impact of large stored fields (.fdt) on query time
>>>> performance. Do query time even depend on the stored field or they just
>>>> depend on indexes?
>>>> 
>>>> 
>>>> Thanks and regards
>>>> Saurabh
>>> 
>>> 
> 


Mime
View raw message