lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <msoko...@gmail.com>
Subject Re: Help with huge index
Date Sun, 04 Mar 2018 15:16:20 GMT
I wonder if you might not get better performance in a case like this if you
were ok taking your index off line, disabling merges, performing deletions
and only then enabling merges? This could be done on a copy of the index if
updates can be turned off or held in a queue, so that queries could still
be served during the maintenance.

However it's largely a theoretical question, since it seems everything
worked ok for you in the end.

On Feb 28, 2018 8:37 PM, "Stuart Goldberg" <sgoldberg@fixflyer.com> wrote:

> Thanks so much. I actually found that my purging routine finished after
> about 35 minutes which is really acceptable given that this routine is
> supposed to run during the overnight period.
>
> On Feb 28, 2018 8:34 PM, "Adrien Grand" <jpountz@gmail.com> wrote:
>
> > Thanks. Deleting lots of documents can indeed trigger a lot of work in
> the
> > Lucene side. First Lucene likely needs to rewrite the live docs of all
> your
> > segments and then this might trigger significant merging activity due to
> > the fact that Lucene tries to keep the number of deleted docs reasonable
> so
> > that most disk space is not spent on deleted docs. I can't think of
> > settings that would make it more efficient.
> >
> > If you call deleteDocuments because you are eg. deleting data after a
> given
> > age, it would help to have time-based indices so that you would remove an
> > entire index at once rather than large portions of an index.
> >
> > Le jeu. 1 mars 2018 à 01:20, Stuart Goldberg <sgoldberg@fixflyer.com> a
> > écrit :
> >
> > > I call deleteDocuments
> > >
> > > On Feb 28, 2018 8:16 PM, "Adrien Grand" <jpountz@gmail.com> wrote:
> > >
> > > > What do you mean by purging? What methods do you call?
> > > >
> > > > Le mer. 28 févr. 2018 à 19:34, Stuart Goldberg <
> sgoldberg@fixflyer.com
> > >
> > > a
> > > > écrit :
> > > >
> > > > > I have huge lucene index. On disk it's about 24Gb.
> > > > >
> > > > >
> > > > >
> > > > > I have a purging routine that is supposed to run and purge old
> docs.
> > > > >
> > > > >
> > > > >
> > > > > There are about 650 million docs in there and through testing I
> have
> > > > > determined that about 1/3 of these need to be purged.
> > > > >
> > > > >
> > > > >
> > > > > During the purge, every so often it's apparently doing some
> flushing
> > > and
> > > > > applying deletes. This causes the process to hang. I know it's not
> > > > hanging,
> > > > > but actually doing work because I have enabled infostream and I am
> > > > getting
> > > > > messages every so often (every 5 minutes).
> > > > >
> > > > >
> > > > >
> > > > > Is there some trick (index config) I can employ to get this to work
> > > > faster.
> > > > >
> > > > >
> > > > >
> > > > > Stuart M Goldberg
> > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message