lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@safaribooksonline.com>
Subject Re: Need Debug Direction on Performance Problem
Date Sun, 18 Jan 2015 16:35:16 GMT
You can also implement your own cursor easily enough if you have a 
unique sortkey (not relevance score). Say you can sort by id, then you 
select batch 1 (50k docs, say) and record the last (maximum) id in the 
batch.  For the next batch, limit it to id > last_id and get the first 
50k docs (don't use start= for paging).  This scales much better when 
scanning a large result set; you'll get constant time across the whole 
set instead of having it increase as you page deeper.

-Mike

On 1/18/2015 7:45 AM, Naresh Yadav wrote:
> Hi Toke,
>
> Thanks for sharing solr internal's for my problem. I will definitely try
> Cursor also but only problem is my current
> solr version is 4.6.1 in which i guess cursor support is not there. Any
> other option i have for this problem ??
>
> Also as per your suggestion i will try to avoid regional units in post.
>
> Thanks
> Naresh
>
> On Sun, Jan 18, 2015 at 4:19 PM, Toke Eskildsen <te@statsbiblioteket.dk>
> wrote:
>
>> Naresh Yadav [nyadav.ait@gmail.com] wrote:
>>> In both setups, we are reading in batches of 50k and each batch taking
>>> Setup1  : approx 7 seconds and for completing all batches of total 10
>> lakh
>>> results takes 1 to 2 minutes.
>>> Setup2 : approx 2-3 minutes and for completing all batches of total 10
>> lakh
>>> results  takes 114 minutes.
>> Deep paging across shards without cursors means that for each request, the
>> full result set up to that point must be requested from each shard. The
>> deeper your page, the longer it takes for each request. If you only
>> extracted 500K results instead of the 1M in setup 2, it would likely take a
>> lot less than 114/2 minutes.
>>
>> Since you are exporting the full result set, you should be using a cursor:
>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
>> This should make your extraction linear to the number of documents and
>> hopefully a lot faster than your current setup.
>>
>> Also, please refrain from using regional units such as "lakh" in an
>> international forum. It requires some readers (me for example) to perform a
>> search in order to be sure on what you are talking about.
>>
>> - Toke Eskildsen
>>


Mime
View raw message