lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phillip Farber <pfar...@umich.edu>
Subject Re: solr optimize - no space left on device
Date Wed, 07 Oct 2009 13:20:35 GMT
All,

We're puzzled why we're still unable to optimize a 192GB index on a LVM 
volume that has 406GB available. We are not using Solr distribution. 
There is no snapshooter in the picture. We run out of disk capacity with 
a df showing 100% but a du showing just  379GB of files.

Restarting tomcat causes space to be recovered and many segments to be 
deleted leaving just 3 from the original 33. Issuing another optimize at 
that point causes solr to run for a while and then show no further 
activity (cpu,memory consumption) in jconsole.  The 3 segments do not 
merge into one.

% df -h .
/dev/mapper/internal-solr--build--2
                      size      used    avail
                       406G  402G   30M 100% /l/solrs/build-2

Also suspicious is the 406G vs. 402G vs. 30M for size vs. used vs. avail

Yesterday, after our 2nd try, lsof showed several "deleted" files that 
were still open that apparently were consuming space almost 134GB.

jsvc       8381    tomcat  377u      REG              253,6  13369098240 
    1982471 /l/solrs/build-2/data/index/_1j37.tis (deleted)
jsvc       8381    tomcat  378u      REG              253,6    184778752 
    1982472 /l/solrs/build-2/data/index/_1j37.tii (deleted)
jsvc       8381    tomcat  379u      REG              253,6  34053685248 
    1982473 /l/solrs/build-2/data/index/_1j37.frq (deleted)
jsvc       8381    tomcat  380u      REG              253,6 130411978752 
    1982474 /l/solrs/build-2/data/index/_1j37.prx (deleted)

That theory did not work because the error log showed that solr was 
trying to merge into the _1j37 segment files showing as deleted in the 
lsof above when it ran out of space so those are a symptom not a cause 
of the lost space:

SEVERE: java.io.IOException: background merge hit exception: 
_ojl:C151080 _169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into 
_1j37 [optimize]]]]e]: java.io.IOException: background merge hit 
exception: _ojl:C151080 _169w:C141302 _1j36:C80405 _1j35:C2043 
_1j34:C192 into _1j37 [

We restored the pre-optimized index again, restarted tomcat and tried to 
optimize using SerialMergePolicy instead of the default 
ConcurrentMergePolicy under the theory that concurrent merges could 
somehow take more than 2X disk space.

The optimize failed again with "out of space" error.  This time there 
where no "deleted" files in the lsof output.

This is one shard out of 10.  A couple the shards were around 192GB and 
merged successfully.  Any suggestions on how to debug this would be 
greatly appreciated.

Thanks!

Phil
hathitrust.org
University of Michigan


Shalin Shekhar Mangar wrote:
> Not sure but a quick search turned up:
> http://www.walkernews.net/2007/07/13/df-and-du-command-show-different-used-disk-space/
> 
> Using upto 2x the index size can happen. Also check if there is a
> snapshooter script running through cron which is making hard links to files
> while a merge is in progress.
> 
> Do let us know if you make any progress. This is interesting.
> 
> On Tue, Oct 6, 2009 at 5:28 PM, Phillip Farber <pfarber@umich.edu> wrote:
> 
>> I am attempting to optimize a large shard on solr 1.4 and repeatedly get
>> java.io.IOException: No space left on device. The shard, after a final
>> commit before optimize, shows a size of about 192GB on a 400GB volume.  I
>> have successfully optimized 2 other shards that were similarly large without
>> this problem on identical hardware boxes.
>>
>> Before the optimize I see:
>>
>> % df -B1 .
>> Filesystem 1B-blocks Used Available Use% Mounted on
>> /dev/mapper/internal-solr--build--2
>>                     435440427008 205681356800 225335255040 48%
>> /l/solrs/build-2
>>
>> slurm-4:/l/solrs/build-2/data/index % du -B1
>> 205441486848 .
>>
>> There's a slight discrepancy between the du and df which appears to be
>> orphaned inodes. But the du says there should be enough space to handle the
>> doubling in size during optimization. However, for the second time we run
>> out of space and the du and df are wildly different at that point and the
>> volume is at 100%
>>
>>
>> % df -B1 .
>>
>> Filesystem           1B-blocks      Used Available Use% Mounted on
>> /dev/mapper/internal-solr--build--2
>>                    435440427008 430985760768  30851072 100%
>> /l/solrs/build-2
>>
>> slurm-4:/l/solrs/build-2/data/index % du -B1
>> 252552298496    .
>>
>> At this point it appears orphaned inodes are consuming space and not being
>> freed-up. Any clue as to whether this is a lucene bug a solr bug or  some
>> other problem.  Error traces follow.
>>
>> Thanks!
>>
>> Phil
>>
>> ---
>>
>> Oct 6, 2009 2:12:37 AM org.apache.solr.update.processor.LogUpdateProcessor
>> finish
>> INFO: {} 0 9110523
>> Oct 6, 2009 2:12:37 AM org.apache.solr.common.SolrException log
>> SEVERE: java.io.IOException: background merge hit exception: _ojl:C151080
>> _169w:C141302 _1j36:C80405 _1j35:C2043 _1j34:C192 into _1j37 [optimize]
>>   at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2737)
>>   at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2658)
>>   at
>> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:401)
>>   at
>> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85)
>>   at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168)
>>   at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
>>   at
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
>>   at
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
>>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
>>   at
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
>>   at
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
>>   at
>> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215)
>>   at
>> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188)
>>   at
>> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
>>   at
>> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172)
>>   at
>> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:548)
>>   at
>> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
>>   at
>> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117)
>>   at
>> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108)
>>   at
>> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174)
>>   at
>> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875)
>>   at
>> org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665)
>>   at
>> org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528)
>>   at
>> org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81)
>>   at
>> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689)
>>   at java.lang.Thread.run(Thread.java:619)
>> Caused by: java.io.IOException: No space left on device
>>   at java.io.RandomAccessFile.writeBytes(Native Method)
>>   at java.io.RandomAccessFile.write(RandomAccessFile.java:466)
>>   at
>> org.apache.lucene.store.FSDirectory$FSIndexOutput.flushBuffer(FSDirectory.java:719)
>>   at
>> org.apache.lucene.store.BufferedIndexOutput.flushBuffer(BufferedIndexOutput.java:96)
>>   at
>> org.apache.lucene.store.BufferedIndexOutput.flush(BufferedIndexOutput.java:85)
>>   at
>> org.apache.lucene.store.BufferedIndexOutput.seek(BufferedIndexOutput.java:124)
>>   at
>> org.apache.lucene.store.FSDirectory$FSIndexOutput.seek(FSDirectory.java:744)
>>   at
>> org.apache.lucene.index.TermInfosWriter.close(TermInfosWriter.java:220)
>>   at
>> org.apache.lucene.index.FormatPostingsFieldsWriter.finish(FormatPostingsFieldsWriter.java:70)
>>   at
>> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:493)
>>   at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:140)
>>   at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4838)
>>   at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:4429)
>>   at
>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:235)
>>   at
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291)
>>
>> Oct 6, 2009 2:12:37 AM org.apache.solr.core.SolrCore execute
>> INFO: [] webapp=/build-2 path=/update params={} status=500 QTime=9110523
>>
>>
>>
> 
> 

Mime
View raw message