lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From simon <mtnes...@gmail.com>
Subject Re: Indexing I/O errors and CorruptIndex messages
Date Thu, 04 May 2017 14:49:25 GMT
I've pretty much ruled out system/hardware issues - the AWS instance has
been rebooted,  and indexing to a core on a new and empty  disk/file system
fails in the same way with a CorruptIndexException.
I can  generally get indexing to complete by significantly dialing down the
number of indexer scripts running concurrently, but the duration goes up
proportionately.

-Simon


On Thu, Apr 27, 2017 at 9:26 AM, simon <mtnest46@gmail.com> wrote:

> Nope ... huge file system (600gb) only 50% full, and a complete index
> would be 80gb max.
>
> On Wed, Apr 26, 2017 at 4:04 PM, Erick Erickson <erickerickson@gmail.com>
> wrote:
>
>> Disk space issue? Lucene requires at least as much free disk space as
>> your index size. Note that the disk full issue will be transient, IOW
>> if you look now and have free space it still may have been all used up
>> but had some space reclaimed.
>>
>> Best,
>> Erick
>>
>> On Wed, Apr 26, 2017 at 12:02 PM, simon <mtnest46@gmail.com> wrote:
>> > reposting this as the problem described is happening again and there
>> were
>> > no responses to the original email. Anyone ?
>> > ----------------------------
>> > I'm seeing an odd error during indexing for which I can't find any
>> reason.
>> >
>> > The relevant solr log entry:
>> >
>> > 2017-03-24 19:09:35.363 ERROR (commitScheduler-30-thread-1) [
>> > x:build0324] o.a.s.u.CommitTracker auto commit
>> > error...:java.io.EOFException: read past EOF:
>> MMapIndexInput(path="/
>> > indexes/solrindexes/build0324/index/_4ku.fdx")
>> >      at org.apache.lucene.store.ByteBufferIndexInput.readByte(
>> > ByteBufferIndexInput.java:75)
>> > ...
>> >     Suppressed: org.apache.lucene.index.CorruptIndexException: checksum
>> > status indeterminate: remaining=0, please run checkindex for more
>> details
>> > (resource=     BufferedChecksumIndexInput(MM
>> apIndexInput(path="/indexes/
>> > solrindexes/build0324/index/_4ku.fdx")))
>> >          at org.apache.lucene.codecs.CodecUtil.checkFooter(
>> > CodecUtil.java:451)
>> >          at org.apache.lucene.codecs.compressing.
>> > CompressingStoredFieldsReader.<init>(CompressingStoredFields
>> Reader.java:140)
>> >  followed within a few seconds by
>> >
>> >  2017-03-24 19:09:56.402 ERROR (commitScheduler-31-thread-1) [
>> > x:build0324] o.a.s.u.CommitTracker auto commit
>> > error...:org.apache.solr.common.SolrException:
>> > Error opening new searcher
>> >     at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:
>> 1820)
>> >     at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1931)
>> > ...
>> > Caused by: java.io.EOFException: read past EOF:
>> > MMapIndexInput(path="/indexes/solrindexes/build0324/index/_4ku.fdx")
>> >     at org.apache.lucene.store.ByteBufferIndexInput.readByte(
>> > ByteBufferIndexInput.java:75)
>> >
>> > This error is repeated a few times as the indexing continued and further
>> > autocommits were triggered.
>> >
>> > I stopped the indexing process, made a backup snapshot of the index,
>> >  restarted indexing at a checkpoint, and everything then completed
>> without
>> > further incidents
>> >
>> > I ran checkIndex on the saved snapshot and it reported no errors
>> > whatsoever. Operations on the complete index (inclcuing an optimize and
>> > several query scripts) have all been error-free.
>> >
>> > Some background:
>> >  Solr information from the beginning of the checkindex output:
>> >  -------
>> >  Opening index @ /indexes/solrindexes/build0324.bad/index
>> >
>> > Segments file=segments_9s numSegments=105 version=6.3.0
>> > id=7m1ldieoje0m6sljp7xocbz9l userData={commitTimeMSec=1490400514324}
>> >   1 of 105: name=_be maxDoc=1227144
>> >     version=6.3.0
>> >     id=7m1ldieoje0m6sljp7xocburb
>> >     codec=Lucene62
>> >     compound=false
>> >     numFiles=14
>> >     size (MB)=4,926.186
>> >     diagnostics = {os=Linux, java.vendor=Oracle Corporation,
>> > java.version=1.8.0_45, java.vm.version=25.45-b02, lucene.version=6.3.0,
>> > mergeMaxNumSegments=-1, os.arch=amd64, java.runtime.version=1.8.0_45-
>> b13,
>> > source=merge, mergeFactor=19, os.version=3.10.0-229.1.2.el7.x86_64,
>> > timestamp=1490380905920}
>> >     no deletions
>> >     test: open reader.........OK [took 0.176 sec]
>> >     test: check integrity.....OK [took 37.399 sec]
>> >     test: check live docs.....OK [took 0.000 sec]
>> >     test: field infos.........OK [49 fields] [took 0.000 sec]
>> >     test: field norms.........OK [17 fields] [took 0.030 sec]
>> >     test: terms, freq, prox...OK [14568108 terms; 612537186 terms/docs
>> > pairs; 801208966 tokens] [took 30.005 sec]
>> >     test: stored fields.......OK [150164874 total field count; avg 122.4
>> > fields per doc] [took 35.321 sec]
>> >     test: term vectors........OK [4804967 total term vector count; avg
>> 3.9
>> > term/freq vector fields per doc] [took 55.857 sec]
>> >     test: docvalues...........OK [4 docvalues fields; 0 BINARY; 1
>> NUMERIC;
>> > 2 SORTED; 0 SORTED_NUMERIC; 1 SORTED_SET] [took 0.954 sec]
>> >     test: points..............OK [0 fields, 0 points] [took 0.000 sec]
>> >   -----
>> >
>> >   The indexing process is a Python script (using the scorched Python
>> > client)  which spawns multiple instance of itself, in this case 6, so
>> there
>> > are definitely concurrent calls ( to /update/json )
>> >
>> > Solrconfig and the schema have not been changed for several months,
>> during
>> > which time many ingests have been done, and the documents which were
>> being
>> > indexed at the time of the error have been indexed before without
>> problems,
>> > so I don't think it's a data issue.
>> >
>> > I saw the same error occur earlier in the day, and decided at that time
>> to
>> > delete the core and restart the Solr instance.
>> >
>> > The server is an Amazon instance running CentOS 7. I checked the system
>> > logs and didn't see any evidence of hardware errors
>> >
>> > I'm puzzled as to why this would start happening out of the blue and I
>> > can't find any partiuclarly relevant posts to this forum or
>> Stackexchange.
>> > Anyone have an idea what's going on ?
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message