lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: lucene index program won't start after power failure
Date Sun, 25 Sep 2016 20:58:28 GMT
It is in theory possible to reconstruct a segments file by ls-ing all
other index files and manually rebuilding it but it is not an easy
task and it would have to make some guesses.

I think in the past a user did manage to create such a tool and maybe
posted the results here either on this list or the dev list?

The segments file is a vital file to the index.  It holds all metadata
about the index segments.  This is why Lucene is so careful about
writing a new one to a "pending" file, fsync'ing that, fsyncing the
directory, and doing an atomic rename, all before removing the older
segment files.

Mike McCandless

http://blog.mikemccandless.com


On Sun, Sep 25, 2016 at 10:37 AM, Ziming Dong <dzm1016397507@gmail.com> wrote:
> sorry to resend.
> I'll change IO to local. Is there anyway to recover first index? now it can
> not be opened by checkIndex, we are building index of 7 billion webpages, it
> costs much time to rebuild.
>
> On Sun, Sep 25, 2016 at 5:31 PM, Ziming Dong <dzm1016397507@gmail.com>
> wrote:
>>
>> I'll change IO to local. Is there anyway to recover first index? now it
>> can be opened by checkIndex, we are building index of 7 billion webpages, it
>> costs much time to rebuild.
>>
>> On Sat, Sep 24, 2016 at 2:54 AM, Michael McCandless
>> <lucene@mikemccandless.com> wrote:
>>>
>>> The 'sync' option for an NFS client just means that every write is
>>> sent immediately across the network.  And it really is useless
>>> performance loss as long as your app (like Lucene) does the "right
>>> thing" with fsync.
>>>
>>> The more important question is why fsync sent to your NFS client and
>>> then to the Mac Mini's NFS server failed to actually move all written
>>> bytes to durable storage.
>>>
>>> Can you reproduce this issue if you use a more well trodden IO system,
>>> e.g. Linux with ext4 on a local IO device?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Fri, Sep 23, 2016 at 12:00 AM, Ziming Dong <dzm1016397507@gmail.com>
>>> wrote:
>>> > I use the macmini on NFS server side. It seems mount option sync  is
>>> > useless, just slows down the index program.
>>> >
>>> > On Fri, Sep 23, 2016 at 4:43 AM, Michael McCandless
>>> > <lucene@mikemccandless.com> wrote:
>>> >>
>>> >> OK sorry I meant your first index, and it seems to have only one
>>> >> (broken) segments file.  Can you post the "ls -l" output of that first
>>> >> index?  It looks like the file was (illegally) filled with 0s, or at
>>> >> least the first 4 bytes were.
>>> >>
>>> >> Lucene writes this file, fsyncs it, does an atomic rename, and fsyncs
>>> >> the directory, so this should not happen, if your IO system honors
>>> >> fsync.
>>> >>
>>> >> What IO devices are used by the NFS server?
>>> >>
>>> >> NFS is not well tested and has several known problems with Lucene so
>>> >> this is already risky ground...
>>> >>
>>> >> Mike McCandless
>>> >>
>>> >> http://blog.mikemccandless.com
>>> >>
>>> >> On Thu, Sep 22, 2016 at 11:33 AM, Ziming Dong
>>> >> <dzm1016397507@gmail.com>
>>> >> wrote:
>>> >> > second index is recovered by checkIndex, I don't know what are
in
>>> >> > second
>>> >> > index directory before recover.
>>> >> > checkIndex can't read first index. index filenames are attached.
>>> >> > I use lucene6.0.0 at the beginning, then I upgrade to lucene6.1.0
to
>>> >> > continue index.
>>> >> >
>>> >> > On Thu, Sep 22, 2016 at 10:17 PM, Michael McCandless
>>> >> > <lucene@mikemccandless.com> wrote:
>>> >> >>
>>> >> >> Do you have 2 separate segments files in that 2nd index?
>>> >> >>
>>> >> >> Which exact Lucene version is this?
>>> >> >>
>>> >> >> Mike McCandless
>>> >> >>
>>> >> >> http://blog.mikemccandless.com
>>> >> >>
>>> >> >>
>>> >> >> On Thu, Sep 22, 2016 at 7:44 AM, Ziming Dong
>>> >> >> <dzm1016397507@gmail.com>
>>> >> >> wrote:
>>> >> >> > I used checkIndex to recover second index though I lost
many docs
>>> >> >> > in
>>> >> >> > index,
>>> >> >> > but first index can't be read by checkIndex, error is
>>> >> >> >
>>> >> >> >> java -cp lucene-core-6.1.0.jar -ea:org.apache.lucene...
>>> >> >> >> org.apache.lucene.index.CheckIndex
>>> >> >> >> /Volumes/HPT8_56T/infomall-index/index0
>>> >> >> >> Opening index @ /Volumes/HPT8_56T/infomall-index/index0
>>> >> >> >> ERROR: could not read any segments file in directory
>>> >> >> >> org.apache.lucene.index.IndexFormatTooOldException:
Format
>>> >> >> >> version
>>> >> >> >> is
>>> >> >> >> not
>>> >> >> >> supported (resource
>>> >> >> >>
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> BufferedChecksumIndexInput(MMapIndexInput(path="/Volumes/HPT8_56T/infomall-index/index0/segments_5t3"))):
>>> >> >> >> 0 (needs to be between 1071082519 and 1071082519).
This version
>>> >> >> >> of
>>> >> >> >> Lucene
>>> >> >> >> only supports indexes created with release 5.0 and
later.
>>> >> >> >>         at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:295)
>>> >> >> >>         at
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)
>>> >> >> >>         at
>>> >> >> >>
>>> >> >> >> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:507)
>>> >> >> >>         at
>>> >> >> >> org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2595)
>>> >> >> >>         at
>>> >> >> >> org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2497)
>>> >> >> >>         at
>>> >> >> >> org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2423)
>>> >> >> >
>>> >> >> >
>>> >> >> >  I use NFS, but I set mount option as  mount -t nfs -o
>>> >> >> > tcp,sync,retrans=10
>>> >> >> > The index program has run 1 month without any problem
before
>>> >> >> > power
>>> >> >> > failure.
>>> >> >> >
>>> >> >> > On Thu, Sep 22, 2016 at 6:06 PM, Michael McCandless
>>> >> >> > <lucene@mikemccandless.com> wrote:
>>> >> >> >>
>>> >> >> >> Hmm I'm no longer so sure this is an IW bug: on commit
we fsync
>>> >> >> >> the
>>> >> >> >> pending_segments_N and then do an atomic rename to
segments_N.
>>> >> >> >>
>>> >> >> >> Can you describe your IO system?  Is it possible it
does not
>>> >> >> >> implement
>>> >> >> >> fsync or atomic renames correctly?
>>> >> >> >>
>>> >> >> >> Also, your 2nd exception indices the segments_N file
was intact
>>> >> >> >> but
>>> >> >> >> the .cfs file was corrupt, which is also hard to explain
unless
>>> >> >> >> fsync
>>> >> >> >> isn't working on your IO system.
>>> >> >> >>
>>> >> >> >> Mike McCandless
>>> >> >> >>
>>> >> >> >> http://blog.mikemccandless.com
>>> >> >> >>
>>> >> >> >> On Thu, Sep 22, 2016 at 5:10 AM, Michael McCandless
>>> >> >> >> <lucene@mikemccandless.com> wrote:
>>> >> >> >> > Sorry for the slow reply here.  Curious that
both of these
>>> >> >> >> > exceptions
>>> >> >> >> > are from IW.init.  I think this may be a real
bug, caused by
>>> >> >> >> > this:
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> > https://github.com/apache/lucene-solr/commit/981bfba841144d08df1d1a183d39fcd6f195ad56
>>> >> >> >> >
>>> >> >> >> > I'll see if I can make a standalone test case
showing this.
>>> >> >> >> >
>>> >> >> >> > If you open those indices with an IndexReader
instead, does it
>>> >> >> >> > succeed?
>>> >> >> >> >
>>> >> >> >> > If you run CheckIndex, what does it report?
>>> >> >> >> >
>>> >> >> >> > Mike McCandless
>>> >> >> >> >
>>> >> >> >> > http://blog.mikemccandless.com
>>> >> >> >> >
>>> >> >> >> > On Wed, Sep 14, 2016 at 1:22 AM, Ziming Dong
>>> >> >> >> > <dzm1016397507@gmail.com>
>>> >> >> >> > wrote:
>>> >> >> >> >> I have 6 machine and 6 index directories,
each machine builds
>>> >> >> >> >> index
>>> >> >> >> >> into
>>> >> >> >> >> one index directory. After power failure
last night, two of
>>> >> >> >> >> those
>>> >> >> >> >> machine
>>> >> >> >> >> can't start index program.
>>> >> >> >> >>
>>> >> >> >> >> one error is
>>> >> >> >> >>
>>> >> >> >> >>> INFO: 2016-09-14 12:31:38 [main]
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.InfomallIndexer$Builder:ignoreCollectionsFile(227):
>>> >> >> >> >>> Loaded 2146 ignored collections from
>>> >> >> >> >>> /mnt/HPT8_56T/infomall-index/index0/ignored_collections.txt
>>> >> >> >> >>> ERROR: 2016-09-14 12:31:39 [main]
>>> >> >> >> >>> sewm.bdbox.util.LogUtil:error(71):
>>> >> >> >> >>> org.apache.lucene.index.IndexFormatTooOldException:
Format
>>> >> >> >> >>> version
>>> >> >> >> >>> is
>>> >> >> >> >>> not
>>> >> >> >> >>> supported (resource
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/HPT8_56T/infomall-index/index0/segments_5t3"))):
>>> >> >> >> >>> 0 (needs to be between 1071082519 and
1071082519). This
>>> >> >> >> >>> version
>>> >> >> >> >>> of
>>> >> >> >> >>> Lucene
>>> >> >> >> >>> only supports indexes created with release
5.0 and later.
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:295)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:284)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:910)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.InfomallIndexer.<init>(InfomallIndexer.java:60)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init>(ThreadedInfomallIndexer.java:28)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init>(ThreadedInfomallIndexer.java:21)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer$Builder.build(ThreadedInfomallIndexer.java:72)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.main(ThreadedInfomallIndexer.java:129)
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> another is
>>> >> >> >> >>
>>> >> >> >> >> INFO: 2016-09-14 01:11:06 [main]
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.InfomallIndexer$Builder:ignoreCollectionsFile(227):
>>> >> >> >> >>> Loaded 8575 ignored collections from
>>> >> >> >> >>> /mnt/HPT8/infomall-index/index5/ignored_collections.txt
>>> >> >> >> >>> ERROR: 2016-09-14 01:11:09 [main]
>>> >> >> >> >>> sewm.bdbox.util.LogUtil:error(71):
>>> >> >> >> >>> org.apache.lucene.index.CorruptIndexException:
codec footer
>>> >> >> >> >>> mismatch
>>> >> >> >> >>> (file
>>> >> >> >> >>> truncated?): actual footer=0 vs expected
footer=-1071082520
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> (resource=MMapIndexInput(path="/mnt/HPT8/infomall-index/index5/_1kqn.cfs"))
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:448)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUtil.java:433)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.<init>(Lucene50CompoundReader.java:86)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.getCompoundReader(Lucene50CompoundFormat.java:71)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> org.apache.lucene.index.IndexWriter.readFieldInfos(IndexWriter.java:1016)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:1033)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:938)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.InfomallIndexer.<init>(InfomallIndexer.java:60)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init>(ThreadedInfomallIndexer.java:28)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init>(ThreadedInfomallIndexer.java:21)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer$Builder.build(ThreadedInfomallIndexer.java:72)
>>> >> >> >> >>>         at
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>>
>>> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.main(ThreadedInfomallIndexer.java:129)
>>> >> >> >> >>>
>>> >> >> >> >>
>>> >> >> >> >>
>>> >> >> >> >> it seems 1071082519 is a special number.
>>> >> >> >> >>
>>> >> >> >> >> - -
>>> >> >> >> >>
>>> >> >> >> >> Ziming Dong
>>> >> >> >> >> *http://suiyuan2009.github.io/
>>> >> >> >> >> <http://suiyuan2009.github.io/>*
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > --
>>> >> >> >
>>> >> >> > Ziming Dong
>>> >> >> > http://suiyuan2009.github.io/
>>> >> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> >
>>> >> > Ziming Dong
>>> >> > http://suiyuan2009.github.io/
>>> >> >
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > Ziming Dong
>>> > http://suiyuan2009.github.io/
>>> >
>>
>>
>>
>>
>> --
>>
>> Ziming Dong
>> http://suiyuan2009.github.io/
>>
>
>
>
> --
>
> Ziming Dong
> http://suiyuan2009.github.io/
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message