lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Should we still optimize?
Date Mon, 08 Aug 2016 16:19:45 GMT
Did you change the merge settings and max segments? If you did, try going back to the defaults.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Aug 8, 2016, at 8:56 AM, Erick Erickson <erickerickson@gmail.com> wrote:
> 
> Callum:
> 
> re: the optimize failing: Perhaps it's just timing out?
> That is, the command succeeds fine (which you
> are reporting), but it's taking long enough that the
> request times out so the client you're using reports an error.....
> Just a guess...
> 
> My personal feeling is that (of course), you need to measure
> your perf before/after optimize to see if there's a measurable
> difference. Apart from that, Shawn's comments about the
> stats being different due to deleted docs is germane.
> 
> Have you tried adding expundeDeletes=true to a commit
> message? See:
> https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers
> 
> A little-known option is to control how aggressively
> the % of deleted documents is factored in to the decision
> whether to merge a segments or not. It takes a little
> code-diving, and faith, but if you look at TieredMergePolicy,
> you'll see a double field: reclaimDeletesWeight.
> 
> Now, in your solrconfig.xml file you can set this, there's a
> clever bit of reflection to allow these to be specified, going
> from memory it's just
> <double name="reclaimDeletesWeight">3.0</double>
> as a node in your tiered merge config. The default is 2.0.
> In terms of what that _does_, that's where code-diving
> comes in.....
> 
> Best,
> Erick
> 
> On Mon, Aug 8, 2016 at 7:59 AM, Callum Lamb <clamb@mintel.com> wrote:
>> Yeah I figured that was too many deleteddocs. It could just be that our max
>> segments is set too high though.
>> 
>> The reason I asked is because our optimize requests have started failing.
>> Or at least,they are appearing to fail because the optimize request returns
>> a non 200. The optimize seems to go ahead successfully regardless though.
>> Before trying to find out if I can  asynchronously request and poll for
>> success (doesn't appear to be possible yet) or a better way of determining
>> success, I thought I'd check if the whole thing was necessary to begin with.
>> 
>> Hopefully it doesn't involve polling the core status until deleteddocs goes
>> below a certain level :/.
>> 
>> Cheers for info.
>> 
>> On Mon, Aug 8, 2016 at 2:58 PM, Shawn Heisey <apache@elyograg.org> wrote:
>> 
>>> On 8/8/2016 3:10 AM, Callum Lamb wrote:
>>>> How true is this claim? Is optimizing still a good idea for the
>>>> general case?
>>> 
>>> For the general case, optimizing is not recommended.  If there are a
>>> very large number of deleted documents, which does describe your
>>> situation, then there is definitely a benefit.
>>> 
>>> In cases where there are a lot of deleted documents, scoring can be
>>> affected by the presence of the deleted documents, and the drop in index
>>> size after an optimize can result in a large performance boost.  For the
>>> general case where there are not many deletes, there *is* a performance
>>> benefit to optimizing down to a single segment, but it is nowhere near
>>> as dramatic as it was in the 1.x/3.x days.
>>> 
>>> The problem with optimizes in the general case is this:  The performance
>>> hit that the optimize operation itself causes may not be worth the small
>>> performance improvement.
>>> 
>>> If you have a time where your index is quiet enough that the optimize
>>> itself won't be disruptive, then you should certainly take advantage of
>>> that time and do the optimize, even if there aren't many deletes.
>>> 
>>> There is another benefit to optimizes that doesn't get mentioned often:
>>> It can make subsequent normal merging operations during indexing faster,
>>> because there will not be as many large segments.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>> 
>> --
>> 
>> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
>> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
>> 
>> Contact details for our other offices can be found at
>> http://www.mintel.com/office-locations.
>> 
>> This email and any attachments may include content that is confidential,
>> privileged
>> or otherwise protected under applicable law. Unauthorised disclosure,
>> copying, distribution
>> or use of the contents is prohibited and may be unlawful. If you have
>> received this email in error,
>> including without appropriate authorisation, then please reply to the
>> sender about the error
>> and delete this email and any attachments.
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message