lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael van Rooyen <mich...@loot.co.za>
Subject Re: Lucene 4.4.0 mergeSegments OutOfMemoryError
Date Thu, 26 Sep 2013 11:00:52 GMT
Thanks for clarifying Uwe.  I will keep the daily optimization turned 
off.  I may be wrong, but I would guess that if the OOM is happening as 
part of the forceMerge, then there may be a chance that it could also 
happen as a natural part of the index growth when big segments are 
merged.  If so, it might be worth looking into anyway. I suspect that it 
may have to do with the way that NumericDocValues fields are handled in 
the merge process, but again, this is just a stab in the dark...

Michael.

On 2013/09/26 12:38 PM, Uwe Schindler wrote:
> Hi,
>
> TieredMergePolicy, which is the default since around Lucene 3.2,  prefers merging segments
with many deletions, so forceMerge(1) is not needed.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
>> -----Original Message-----
>> From: Michael van Rooyen [mailto:michael@loot.co.za]
>> Sent: Thursday, September 26, 2013 12:26 PM
>> To: java-user@lucene.apache.org
>> Cc: Ian Lea
>> Subject: Re: Lucene 4.4.0 mergeSegments OutOfMemoryError
>>
>> Yes, it happens as part of the early morning optimize, and yes, it's a
>> forceMerge(1) which I've disabled for now.
>>
>> I haven't looked at the persistence mechanism for Lucene since 2.x, but if I
>> remember correctly, the deleted documents would stay in an index segment
>> until that segment was eventually merged.  Without forcing a merge
>> (optimize in old versions), the footprint on disk could be a multiple of the
>> actual space required for the live documents, and this would have an impact
>> on performance (the deleted documents would clutter the buffer cache).
>>
>> Is this still the case?  I would have thought it good practice to force the dead
>> space out of an index periodically, but if the underlying storage mechanism
>> has changed and the current index files are more efficient at housekeeping,
>> this may no longer be necessary.
>>
>> If someone could shed a little light on best practice for indexes where
>> documents are frequently updated (i.e. deleted and re-added), that would
>> be great.
>>
>> Michael.
>>
>>
>> On 2013/09/26 11:43 AM, Ian Lea wrote:
>>> Is this OOM happening as part of your early morning optimize or at
>>> some other point?  By optimize do you mean IndexWriter.forceMerge(1)?
>>> You really shouldn't have to use that. If the index grows forever
>>> without it then something else is going on which you might wish to
>>> report separately.
>>>
>>>
>>> --
>>> Ian.
>>>
>>>
>>> On Wed, Sep 25, 2013 at 12:35 PM, Michael van Rooyen
>> <michael@loot.co.za> wrote:
>>>> We've recently upgraded to Lucene 4.4.0 and mergeSegments now
>> causes
>>>> an OOM error.
>>>>
>>>> As background, our index contains about 14 million documents (growing
>>>> slowly) and we process about 1 million updates per day. It's about
>>>> 8GB on disk.  I'm not sure if the Lucene segments merge the way they
>>>> used to in the early versions, but we've always optimized at 3am to
>>>> get rid of dead space in the index, or otherwise it grows forever.
>>>>
>>>> The mergeSegments was working under 4.3.1 but the index has grown
>>>> somewhat on disk since then, probably due to a couple of added
>>>> NumericDocValues fields.  The java process is assigned about 3GB (the
>>>> maximum, as it's running on a 32 bit i686 Linux box), and it still goes OOM.
>>>>
>>>> Any advice as to the possible cause and how to circumvent it would be
>> great.
>>>> Here's the stack trace:
>>>>
>>>> org.apache.lucene.index.MergePolicy$MergeException:
>>>> java.lang.OutOfMemoryError: Java heap space
>>>>
>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeExceptio
>> n
>>>> (ConcurrentMergeScheduler.java:545)
>>>>
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Co
>> nc
>>>> urrentMergeScheduler.java:518) Caused by:
>> java.lang.OutOfMemoryError:
>>>> Java heap space
>>>>
>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.loadNume
>> r
>>>> ic(Lucene42DocValuesProducer.java:212)
>>>>
>> org.apache.lucene.codecs.lucene42.Lucene42DocValuesProducer.getNumeri
>>>> c(Lucene42DocValuesProducer.java:174)
>>>>
>> org.apache.lucene.index.SegmentCoreReaders.getNormValues(SegmentCor
>> eR
>>>> eaders.java:301)
>>>>
>> org.apache.lucene.index.SegmentReader.getNormValues(SegmentReader.j
>> av
>>>> a:253)
>>>>
>> org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.jav
>> a:2
>>>> 15)
>>>>
>> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:119)
>> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3772
>>>> )
>>>> org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3376)
>>>>
>> org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(Concurrent
>> Me
>>>> rgeScheduler.java:405)
>>>>
>> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(Co
>> nc
>>>> urrentMergeScheduler.java:482)
>>>>
>>>>
>>>> Thanks,
>>>> Michael.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message