lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Goll <johannes.g...@gmail.com>
Subject Re: apache-solr-3.1 slow stats component queries
Date Thu, 05 May 2011 17:26:16 GMT
Hi,

I bench-marked the slow stats queries (6 point estimate) using the same
hardware on an index of size 104M. We use a Solr/Lucene 3.1-mod which
returns only the sum and count for statistics component results. Solr/Lucene
is run on jetty.

The relationship between query time and set of found documents is linear
when using the stats component (R^2 0.99). I guess this is expected as the
application needs to scan/sum-up the stat field for all matching documents?

Are there any plans for caching stat results for a certain stat field along
with the documents that match a filter query ? Any other ideas that could
help to improve this (hardware/software configuration) ?  Even for a subset
of 10M entries, the stat search takes on the order of 10 seconds.

Thanks in advance.
Johannes



2011/4/18 Johannes Goll <johannes.goll@gmail.com>

> any ideas why in this case the stats summaries are so slow  ?  Thank you
> very much in advance for any ideas/suggestions. Johannes
>
>
> 2011/4/5 Johannes Goll <johannes.goll@gmail.com>
>
>> Hi,
>>
>> thank you for making the new apache-solr-3.1 available.
>>
>> I have installed the version from
>>
>> http://apache.tradebit.com/pub//lucene/solr/3.1.0/
>>
>> and am running into very slow stats component queries (~ 1 minute)
>> for fetching the computed sum of the stats field
>>
>> url: ?q=*:*&start=0&rows=0&stats=true&stats.field=weight
>>
>> <int name="QTime">52825</int>
>>
>> #documents:     78,359,699
>> total RAM:         256G
>> vm arguments:  -server -xmx40G
>>
>> the stats.field specification is as follows:
>> <field name="weight"                type="pfloat"    indexed="true"
>> stored="false"     required="true"     multiValued="false"
>> default="1"/>
>>
>> filter queries that narrow down the #docs help to reduce it -
>> QTime seems to be proportional to the number of docs being returned
>> by a filter query.
>>
>> Is there any way to improve the performance of such stats queries ?
>> Caching only helped to improve the filter query performance but if
>> larger subsets are being returned, QTime increases unacceptably.
>>
>> Since I only need the sum and not the STD or sumsOfSquares/Min/Max,
>> I have created a custom 3.1 version that does only return the sum. But
>> this
>> only slightly improved the performance. Of course I could somehow cache
>> the larger sum queries on the client side but I want to do this only as a
>> last resort.
>>
>> Thank you very much in advance for any ideas/suggestions.
>>
>> Johannes
>>
>>
>
>
> --
> Johannes Goll
> 211 Curry Ford Lane
> Gaithersburg, Maryland 20878
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message