lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jürgen Albert <>
Subject Re: index size growing while deleting
Date Tue, 10 Nov 2015 11:19:49 GMT
Hi Rob,

we had a similar problem. In our case we had open index readers, that 
blocked the index from merging its segments and thus deleting the marked 



Am 06.11.2015 um 08:59 schrieb Rob Audenaerde:
> Hi will, others
> Thanks for you reply,
> As far as I understand it, deleting a document is just setting the deleted
> bit, and when segments are merged, then the documents are removed. (not
> really sure what this means exactly; I guess the document gets removed from
> the store, the terms will no longer refer to that document. Not sure if
> terms get removed if no longer needed, etc). If there are resources to read
> to improve my understanding I havo not found them (yet), if you could point
> me to some that be great!
> I use the default IndexWriterConfig, which I see uses TieredMergePolicy. I
> never close my InderWriter; as I use NRT searching I just alwyas keep it
> open.
> My two guesses are that: a) old segments are not removed from disk or b)
> deletes are not cleaned up as well as I though they would be.
> I have made a testcase which indexes 5 million rows (five iterations, five
> indexing thread, indexing and deleting all such documents after each
> iterator with deleteByQuery), the rows randomly generated. I see the
> Taxonomy ever growing (which is logical, because facet-ordinals are never
> removed as far as I understand); the index grows, but also shrinks when
> deleting. So I cannot reproduce my problem easily :(
> I will start diving into the Lucene source code, but I was hoping I just
> did something wrong. .
> Any hints are appreciated!
> -Rob
> On Thu, Nov 5, 2015 at 2:52 PM, will <> wrote:
>> Hi Rob:
>> Do you understand how deletes work and how an index is compacted?
>> There's some configuration/runtime activities you don't mention.... And
>> you make testing process sound like a mirror of production? (Including
>> configuration?)
>> -will
>> On 11/5/15 7:33 AM, Rob Audenaerde wrote:
>>> Hi all,
>>> I'm currently investigating an issue we have with our index. It keeps
>>> getting bigger, and I don't het why.
>>> Here is our use case:
>>> We index a database of about 4 million records; spread over a few hundred
>>> tables. The data consists of a mix of text, dates, numbers etc. We also
>>> add
>>> all these fields as facets.
>>> Each night we delete about 90% of the data, which in testing reduces the
>>> index size significantly.
>>> We store the data as StoredFields as well, to prevent having to access the
>>> database at all.
>>> We use FloatAssociatedFacet fields for the facets.
>>> In production however, it seems the index is only growing, up to 71 GB for
>>> these records for a month of running.
>>> It seems that lucene's index in just getting bigger there.
>>> We use lucene 5.3 on CentOS, java 8 64 bit.
>>> The taxonomy-index does not grow significantly.
>>> How should I go about checking what is wrong?
>>> Thanks!

Jürgen Albert

Data In Motion UG (haftungsbeschränkt)

Kahlaische Str. 4
07745 Jena

Mobil:  0157-72521634



Jena HBR 507027
USt-IdNr: DE274553639
St.Nr.: 162/107/04586

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message