lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: mutability of lucene index files
Date Sat, 12 Sep 2015 18:07:47 GMT
Hi,

"segments.gen" no longer exists in Lucene 5.x (because of Java 7 NIO.2 update). Every commit
point (segments_xxx) also gets a new filename.

This means: Yes, every (and really every) file in a Lucene index is write-once. That is the
basis of the whole snapshotting concept that Lucene internally uses.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Larry White [mailto:lwhite@tracelink.com]
> Sent: Saturday, September 12, 2015 7:59 PM
> To: java-user@lucene.apache.org
> Subject: Re: mutability of lucene index files
> 
> Hi Erick,
> 
> Thank you.
> 
> Deleting old files is fine (and expected), so it sounds like the segment files
> are immutable (prior to deletion) and the file that handles deletion is
> renamed with every change, so it's effectively immutable, too.
> 
> That leaves the segments_* files and segments.gen, if I understand
> correctly.
> 
> And thank you for the pointer. I'm hoping to use the same process to backup
> and restore all my data (Lucene and otherwise), and to be able to use an
> incremental approach so that the system doesn't need to be offline too long,
> but I'll definitely take another look at snapshots.
> 
> Thanks again
> 
> 
> On Sat, Sep 12, 2015 at 12:50 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
> 
> > The Lucene index segment files are immutable, once they're closed,
> > they are never changed. These are things like _1.fdt, _1.tim, etc. All
> > of the files with the same prefix (_1 in my example) comprise a single
> > "segment". Segments _will_, however, disappear. During indexing, two
> > or more segment are combined into a new segment, so _1.*, _2.* and
> > _3.* could be copied to _4.* then _1.*, _2.* and _3.* will be removed.
> >
> > There is one exception to the rule "segment files are not changed",
> > and that's the file that contains information about documents in that
> > segment that have been deleted. Actually that file is re-written to a
> > new name every time a doc is deleted from the segment upon commit.
> >
> > And another exception is that there is a file or two that contains the
> > information about what segments comprise the most recent (hard)
> > commit, in 4x segments_* and segments.gen.
> >
> > So rather than try to wrap your head around all this and then worry
> > about what changes when the next major release comes out, would it
> > work to just use the built-in snapshot process? Here's something I
> > found (but didn't look at very closely) to get you started:
> >
> > http://stackoverflow.com/questions/17753226/lucene-4-3-1-backup-
> proces
> > s
> >
> > And there's a link to the Lucene user's list where the question was
> > answered..
> >
> > Best,
> > Erick
> >
> > On Sat, Sep 12, 2015 at 7:59 AM, Larry White <lwhite@tracelink.com>
> wrote:
> > > Hi,
> > >
> > > I'm writing a backup routine for a system that includes Lucene for
> > > full-text search. The primary data store is based on immutable
> > > files, so
> > it
> > > can be backed-up incrementally by copying any new files (and
> > > removing any files that have been deleted from earlier backups).
> > > It's my understanding from brief comments found on the internet that
> > > most, if not all the files that comprise a Lucene index are similarly
> immutable.
> > >
> > > Can someone please confirm or deny that statement?
> > >
> > > If the Lucene files are mostly, but not entirely, immutable, it
> > > would be greatly appreciated if the exceptions could be identified.
> > > I would
> > imagine
> > > there might be log files that would be mutable, for example.
> > >
> > > Thank you very much for your help.
> > >
> > > Larry
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> --
> *Larry White |  TraceLink Inc. | Principal Software Architect*
> 400 Riverpark Dr. | North Reading, MA | 01864
> e: lwhite@tracelink.com
> www.tracelink.com
> 
> 
> *Protect patients, enable health, grow profits, ensure compliance*


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message