lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naresh Yadav <nyadav....@gmail.com>
Subject Re: Need Debug Direction on Performance Problem
Date Mon, 19 Jan 2015 05:50:21 GMT
Toke, won't be able to use TermsComponent as i had complex filter criteria
on other fields.

Michael, i understood your idea of paging without using start=,
will prototype it as it is possible in my usecase also and post here
results i got with this approach.

On Sun, Jan 18, 2015 at 10:05 PM, Michael Sokolov <
msokolov@safaribooksonline.com> wrote:

> You can also implement your own cursor easily enough if you have a unique
> sortkey (not relevance score). Say you can sort by id, then you select
> batch 1 (50k docs, say) and record the last (maximum) id in the batch.  For
> the next batch, limit it to id > last_id and get the first 50k docs (don't
> use start= for paging).  This scales much better when scanning a large
> result set; you'll get constant time across the whole set instead of having
> it increase as you page deeper.
>
> -Mike
>
>
> On 1/18/2015 7:45 AM, Naresh Yadav wrote:
>
>> Hi Toke,
>>
>> Thanks for sharing solr internal's for my problem. I will definitely try
>> Cursor also but only problem is my current
>> solr version is 4.6.1 in which i guess cursor support is not there. Any
>> other option i have for this problem ??
>>
>> Also as per your suggestion i will try to avoid regional units in post.
>>
>> Thanks
>> Naresh
>>
>> On Sun, Jan 18, 2015 at 4:19 PM, Toke Eskildsen <te@statsbiblioteket.dk>
>> wrote:
>>
>>  Naresh Yadav [nyadav.ait@gmail.com] wrote:
>>>
>>>> In both setups, we are reading in batches of 50k and each batch taking
>>>> Setup1  : approx 7 seconds and for completing all batches of total 10
>>>>
>>> lakh
>>>
>>>> results takes 1 to 2 minutes.
>>>> Setup2 : approx 2-3 minutes and for completing all batches of total 10
>>>>
>>> lakh
>>>
>>>> results  takes 114 minutes.
>>>>
>>> Deep paging across shards without cursors means that for each request,
>>> the
>>> full result set up to that point must be requested from each shard. The
>>> deeper your page, the longer it takes for each request. If you only
>>> extracted 500K results instead of the 1M in setup 2, it would likely
>>> take a
>>> lot less than 114/2 minutes.
>>>
>>> Since you are exporting the full result set, you should be using a
>>> cursor:
>>> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
>>> This should make your extraction linear to the number of documents and
>>> hopefully a lot faster than your current setup.
>>>
>>> Also, please refrain from using regional units such as "lakh" in an
>>> international forum. It requires some readers (me for example) to
>>> perform a
>>> search in order to be sure on what you are talking about.
>>>
>>> - Toke Eskildsen
>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message