lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Should we still optimize?
Date Mon, 08 Aug 2016 15:56:16 GMT
Callum:

re: the optimize failing: Perhaps it's just timing out?
That is, the command succeeds fine (which you
are reporting), but it's taking long enough that the
request times out so the client you're using reports an error.....
Just a guess...

My personal feeling is that (of course), you need to measure
your perf before/after optimize to see if there's a measurable
difference. Apart from that, Shawn's comments about the
stats being different due to deleted docs is germane.

Have you tried adding expundeDeletes=true to a commit
message? See:
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers

A little-known option is to control how aggressively
the % of deleted documents is factored in to the decision
whether to merge a segments or not. It takes a little
code-diving, and faith, but if you look at TieredMergePolicy,
you'll see a double field: reclaimDeletesWeight.

Now, in your solrconfig.xml file you can set this, there's a
clever bit of reflection to allow these to be specified, going
from memory it's just
<double name="reclaimDeletesWeight">3.0</double>
as a node in your tiered merge config. The default is 2.0.
In terms of what that _does_, that's where code-diving
comes in.....

Best,
Erick

On Mon, Aug 8, 2016 at 7:59 AM, Callum Lamb <clamb@mintel.com> wrote:
> Yeah I figured that was too many deleteddocs. It could just be that our max
> segments is set too high though.
>
> The reason I asked is because our optimize requests have started failing.
> Or at least,they are appearing to fail because the optimize request returns
> a non 200. The optimize seems to go ahead successfully regardless though.
> Before trying to find out if I can  asynchronously request and poll for
> success (doesn't appear to be possible yet) or a better way of determining
> success, I thought I'd check if the whole thing was necessary to begin with.
>
> Hopefully it doesn't involve polling the core status until deleteddocs goes
> below a certain level :/.
>
> Cheers for info.
>
> On Mon, Aug 8, 2016 at 2:58 PM, Shawn Heisey <apache@elyograg.org> wrote:
>
>> On 8/8/2016 3:10 AM, Callum Lamb wrote:
>> > How true is this claim? Is optimizing still a good idea for the
>> > general case?
>>
>> For the general case, optimizing is not recommended.  If there are a
>> very large number of deleted documents, which does describe your
>> situation, then there is definitely a benefit.
>>
>> In cases where there are a lot of deleted documents, scoring can be
>> affected by the presence of the deleted documents, and the drop in index
>> size after an optimize can result in a large performance boost.  For the
>> general case where there are not many deletes, there *is* a performance
>> benefit to optimizing down to a single segment, but it is nowhere near
>> as dramatic as it was in the 1.x/3.x days.
>>
>> The problem with optimizes in the general case is this:  The performance
>> hit that the optimize operation itself causes may not be worth the small
>> performance improvement.
>>
>> If you have a time where your index is quiet enough that the optimize
>> itself won't be disruptive, then you should certainly take advantage of
>> that time and do the optimize, even if there aren't many deletes.
>>
>> There is another benefit to optimizes that doesn't get mentioned often:
>> It can make subsequent normal merging operations during indexing faster,
>> because there will not be as many large segments.
>>
>> Thanks,
>> Shawn
>>
>>
>
> --
>
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for our other offices can be found at
> http://www.mintel.com/office-locations.
>
> This email and any attachments may include content that is confidential,
> privileged
> or otherwise protected under applicable law. Unauthorised disclosure,
> copying, distribution
> or use of the contents is prohibited and may be unlawful. If you have
> received this email in error,
> including without appropriate authorisation, then please reply to the
> sender about the error
> and delete this email and any attachments.
>

Mime
View raw message