lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject Re: Re: solr 4.2.1 index gets slower over time
Date Tue, 01 Apr 2014 19:39:41 GMT
You may want to increase reclaimdeletesweight for tieredmergepolicy from 2 to 3 or 4. By default
it may keep too much deleted or updated docs in the index. This can increase index size by
50%!! Dmitry Kan <solrexpert@gmail.com> schreef:Elisabeth,

Yes, I believe you are right in that the deletes are part of the optimize
process. If you delete often, you may consider (if not already) the
TieredMergePolicy, which is suited for this scenario. Check out this
relevant discussion I had with Lucene committers:
https://twitter.com/DmitryKan/status/399820408444051456

HTH,

Dmitry


On Tue, Apr 1, 2014 at 11:34 AM, elisabeth benoit <elisaelisaelisa@gmail.com
> wrote:

> Thanks a lot for your answers!
>
> Shawn. Our GC configuration has far less parameters defined, so we'll check
> this out.
>
> Dimitry, about the expungeDeletes option, we'll add that in the delete
> process. But from what I read, this is done in the optimize process (cf.
>
> http://lucene.472066.n3.nabble.com/Does-expungeDeletes-need-calling-during-an-optimize-td1214083.html
> ).
> Or maybe not?
>
> Thanks again,
> Elisabeth
>
>
> 2014-04-01 7:52 GMT+02:00 Dmitry Kan <solrexpert@gmail.com>:
>
> > Hi,
> >
> > We have noticed something like this as well, but with older versions of
> > solr, 3.4. In our setup we delete documents pretty often. Internally in
> > Lucene, when a document is client requested to be deleted, it is not
> > physically deleted, but only marked as "deleted". Our original
> optimization
> > assumption was such that the "deleted" documents would get physically
> > removed on each optimize command issued. We started to suspect it wasn't
> > always true as the shards (especially relatively large shards) became
> > slower over time. So we found out about the expungeDeletes option, which
> > purges the "deleted" docs and is by default false. We have set it to
> true.
> > If your solr update lifecycle includes frequent deletes, try this out.
> >
> > This of course does not override working towards finding better
> > GCparameters.
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
> >
> >
> > On Mon, Mar 31, 2014 at 3:57 PM, elisabeth benoit <
> > elisaelisaelisa@gmail.com
> > > wrote:
> >
> > > Hello,
> > >
> > > We are currently using solr 4.2.1. Our index is updated on a daily
> basis.
> > > After noticing solr query time has increased (two times the initial
> size)
> > > without any change in index size or in solr configuration, we tried an
> > > optimize on the index but it didn't fix our problem. We checked the
> > garbage
> > > collector, but everything seemed fine. What did in fact fix our problem
> > was
> > > to delete all documents and reindex from scratch.
> > >
> > > It looks like over time our index gets "corrupted" and optimize doesn't
> > fix
> > > it. Does anyone have a clue how to investigate further this situation?
> > >
> > >
> > > Elisabeth
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message