lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shouvik Bardhan <sbard...@gisfederal.com>
Subject Re: When does a segment file gets written...
Date Wed, 05 Nov 2014 13:13:06 GMT
Ours is an index for keeps and also keeps growing for many weeks till we
decide to re-ingest again. I am counting docs getting added within my
little server and every 5 million docs or so I am calling commit. I will
play with the threshold and also bring in an elapsed time till last commit
to refine it. I am a little surprised that when the flush happens to the
disk (and I can see that this happens often enuf under the hood), there is
virtually no pause in the indexing (or searching) but during commit there
seems to be a big pause. I will play with it and look at the code and
attempt to understand.

thanks for the help,


On Tue, Nov 4, 2014 at 12:01 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> On Tue, Nov 4, 2014 at 11:44 AM, Shouvik Bardhan
> <sbardhan@gisfederal.com> wrote:
>
> > Thanks for the reply (and thanks for everything else too !!) Mike.
>
> You're welcome!
>
> > I am unable to understand when to call commit. Should I start counting
> the
> > number of documents I am ingesting and then say every 10 million docs do
> a
> > commit()? I dont want to do a commit too frequently cause that does not
> > sound correct. Since the ingest velocity is variable, the best thing
> would
> > have been if I could have this commit() happen when say X number of docs
> > are written. I will try and see if I could find a way to find a good time
> > to commit.
>
> It's really up to you.
>
> commit is quite costly, especially for spinning magnets disks, so you
> should call it rarely.
>
> But then it gives you durability, meaning if the OS crashes, computer
> loses power, JVM crashes or is killed, etc., on startup your index
> will only reflect the last successful commit, so you want to call
> commit frequently enough so you don't lose too much data on such
> events.
>
> If the data is transient / you can simply start indexing again on
> startup, then never call commit :)
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message