lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Solr Deleted Docs Issue
Date Mon, 16 Mar 2015 15:51:47 GMT
On 3/16/2015 9:11 AM, vicky desai wrote:
> I am having an issue with my solr setup. In my solr config I have set
> following property
> *<mergeFactor>10</mergeFactor>*

The mergeFactor setting is deprecated ... but you are setting it to the
default value of 10 anyway, so that's not really a big deal.  It's
possible that mergeFactor will no longer work in 5.0, but I'm not sure
on that.  You should instead use the settings specific to the merge
policy, which normally is TieredMergePolicy.

Note that when mergeFactor is 10, you *will* end up with more than 10
segments in your index.  There are multiple merge tiers, each one can
have up to 10 segments before it is merged.

> Now consider following situation. I have* 200* documents in my index. I need
> to update all the 200 docs
> If total commit operations I hit are* 20* i.e I update batches of 10 docs
> merging is done after every 10th update and so the max Segment Count I can
> have is 10 which is fine. However even when merging happens deleted docs are
> not cleared and I end up with 100 deleted docs in index. 
>
> If this operation is continuously done I would end up with a large set of
> deleted docs which will affect the performance of the queries I hit on this
> solr.

Because there are multiple merge tiers and you cannot easily
pre-determine which segments will be chosen for a particular merge, the
merge behavior may not be exactly what you expect.

The only guaranteed way to get rid of your deleted docs is to do an
optimize operation, which forces a merge of the entire index down to a
single segment.  This gets rid of all deleted docs in those segments. 
If you index more data while you are doing the optimize, then you may
end up with additional deleted docs.

Thanks,
Shawn


Mime
View raw message