lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr Deleted Docs Issue
Date Mon, 16 Mar 2015 16:08:43 GMT
bq: If this operation is continuously done I would end up with a large set of
deleted docs which will affect the performance of the queries I hit on this
solr.

No, you won't. They'll be "merged away" as background segments are merged.
Here's a great visualization of the process, the third one down is the
default TieredMergePolicy.

In general, even in the case of replacing all the docs, you'll have 10% of your
corpus be deleted docs. The % of deleted docs in a segment weighs quite
heavily when it comest to the decision of which segment to merge (note that
merging purges the deleted docs).

Also in general, the results of small tests like this simply do not generalize.
i.e. the number of deleted docs in a 200 doc sample size can't be
extrapolated to a reasonable-sized corpus.

Finally, I don't know if this is something temporary, but the implication of
"If total commit operations I hit are 20" is that you're committing after every
batch of docs is sent to Solr. You should not do this, let your autocommit
settings handle this.

Here's Mike's blog:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

Best,
Erick

On Mon, Mar 16, 2015 at 8:51 AM, Shawn Heisey <apache@elyograg.org> wrote:
> On 3/16/2015 9:11 AM, vicky desai wrote:
>> I am having an issue with my solr setup. In my solr config I have set
>> following property
>> *<mergeFactor>10</mergeFactor>*
>
> The mergeFactor setting is deprecated ... but you are setting it to the
> default value of 10 anyway, so that's not really a big deal.  It's
> possible that mergeFactor will no longer work in 5.0, but I'm not sure
> on that.  You should instead use the settings specific to the merge
> policy, which normally is TieredMergePolicy.
>
> Note that when mergeFactor is 10, you *will* end up with more than 10
> segments in your index.  There are multiple merge tiers, each one can
> have up to 10 segments before it is merged.
>
>> Now consider following situation. I have* 200* documents in my index. I need
>> to update all the 200 docs
>> If total commit operations I hit are* 20* i.e I update batches of 10 docs
>> merging is done after every 10th update and so the max Segment Count I can
>> have is 10 which is fine. However even when merging happens deleted docs are
>> not cleared and I end up with 100 deleted docs in index.
>>
>> If this operation is continuously done I would end up with a large set of
>> deleted docs which will affect the performance of the queries I hit on this
>> solr.
>
> Because there are multiple merge tiers and you cannot easily
> pre-determine which segments will be chosen for a particular merge, the
> merge behavior may not be exactly what you expect.
>
> The only guaranteed way to get rid of your deleted docs is to do an
> optimize operation, which forces a merge of the entire index down to a
> single segment.  This gets rid of all deleted docs in those segments.
> If you index more data while you are doing the optimize, then you may
> end up with additional deleted docs.
>
> Thanks,
> Shawn
>

Mime
View raw message