lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart Goldberg <sgoldb...@fixflyer.com>
Subject Re: Help with huge index
Date Thu, 01 Mar 2018 01:37:48 GMT
Thanks so much. I actually found that my purging routine finished after
about 35 minutes which is really acceptable given that this routine is
supposed to run during the overnight period.

On Feb 28, 2018 8:34 PM, "Adrien Grand" <jpountz@gmail.com> wrote:

> Thanks. Deleting lots of documents can indeed trigger a lot of work in the
> Lucene side. First Lucene likely needs to rewrite the live docs of all your
> segments and then this might trigger significant merging activity due to
> the fact that Lucene tries to keep the number of deleted docs reasonable so
> that most disk space is not spent on deleted docs. I can't think of
> settings that would make it more efficient.
>
> If you call deleteDocuments because you are eg. deleting data after a given
> age, it would help to have time-based indices so that you would remove an
> entire index at once rather than large portions of an index.
>
> Le jeu. 1 mars 2018 à 01:20, Stuart Goldberg <sgoldberg@fixflyer.com> a
> écrit :
>
> > I call deleteDocuments
> >
> > On Feb 28, 2018 8:16 PM, "Adrien Grand" <jpountz@gmail.com> wrote:
> >
> > > What do you mean by purging? What methods do you call?
> > >
> > > Le mer. 28 févr. 2018 à 19:34, Stuart Goldberg <sgoldberg@fixflyer.com
> >
> > a
> > > écrit :
> > >
> > > > I have huge lucene index. On disk it's about 24Gb.
> > > >
> > > >
> > > >
> > > > I have a purging routine that is supposed to run and purge old docs.
> > > >
> > > >
> > > >
> > > > There are about 650 million docs in there and through testing I have
> > > > determined that about 1/3 of these need to be purged.
> > > >
> > > >
> > > >
> > > > During the purge, every so often it's apparently doing some flushing
> > and
> > > > applying deletes. This causes the process to hang. I know it's not
> > > hanging,
> > > > but actually doing work because I have enabled infostream and I am
> > > getting
> > > > messages every so often (every 5 minutes).
> > > >
> > > >
> > > >
> > > > Is there some trick (index config) I can employ to get this to work
> > > faster.
> > > >
> > > >
> > > >
> > > > Stuart M Goldberg
> > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message