lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Audenaerde <rob.audenae...@gmail.com>
Subject Re: index size growing while deleting
Date Tue, 10 Nov 2015 12:47:57 GMT
Ah yes, that is the way to go.

It is a bit harder here, because we also use a per-user InMemoryIndex that
is combined in a multi-reader, so it will be a bit more work, but I think
it will be doable. Thanks for all the help.

That said, I found it not-so-easy to debug this issue; are there methods
(on the IndexWriter / text in the infoStream?) that I could have used to
detect what was going on? That might be helpful for other as well?

-Rob


On Tue, Nov 10, 2015 at 1:32 PM, Jürgen Albert <j.albert@data-in-motion.biz>
wrote:

> Hi Rob,
>
> we use a SearcherManager to obtain a fresh Searcher for every Query. From
> the Searcher we get the Reader. After the query you call
> searcherManager.release(searcher). The SearcherManager takes care of the
> rest.
>
> Regards,
>
> Jürgen.
>
>
> Am 10.11.2015 um 13:27 schrieb Rob Audenaerde:
>
>> Hi Jürgen, Michael
>>
>> Thanks! I seem to be able to reduce the index size by closing and
>> restarting our application. This reduces the index size from 22G tot 4G,
>> with is somewhat the expected size. The infoStream also gives me the
>> 'removed unreferenced file (IFD 0 [2015-11-10T12:21:49.293Z; main]: init:
>> removing unreferenced file '...)
>>
>> Now I just need to figure out how to close the IndexReader while keeping
>> the application running..  I guess I should/could do something with the
>> openIfChanged. Will look further.
>>
>> -Rob
>>
>>
>>
>> On Tue, Nov 10, 2015 at 12:19 PM, Jürgen Albert <
>> j.albert@data-in-motion.biz
>>
>>> wrote:
>>> Hi Rob,
>>>
>>> we had a similar problem. In our case we had open index readers, that
>>> blocked the index from merging its segments and thus deleting the marked
>>> segments.
>>>
>>> Regards,
>>>
>>> Jürgen.
>>>
>>>
>>> Am 06.11.2015 um 08:59 schrieb Rob Audenaerde:
>>>
>>> Hi will, others
>>>>
>>>> Thanks for you reply,
>>>>
>>>> As far as I understand it, deleting a document is just setting the
>>>> deleted
>>>> bit, and when segments are merged, then the documents are removed. (not
>>>> really sure what this means exactly; I guess the document gets removed
>>>> from
>>>> the store, the terms will no longer refer to that document. Not sure if
>>>> terms get removed if no longer needed, etc). If there are resources to
>>>> read
>>>> to improve my understanding I havo not found them (yet), if you could
>>>> point
>>>> me to some that be great!
>>>>
>>>> I use the default IndexWriterConfig, which I see uses
>>>> TieredMergePolicy. I
>>>> never close my InderWriter; as I use NRT searching I just alwyas keep it
>>>> open.
>>>>
>>>> My two guesses are that: a) old segments are not removed from disk or b)
>>>> deletes are not cleaned up as well as I though they would be.
>>>>
>>>> I have made a testcase which indexes 5 million rows (five iterations,
>>>> five
>>>> indexing thread, indexing and deleting all such documents after each
>>>> iterator with deleteByQuery), the rows randomly generated. I see the
>>>> Taxonomy ever growing (which is logical, because facet-ordinals are
>>>> never
>>>> removed as far as I understand); the index grows, but also shrinks when
>>>> deleting. So I cannot reproduce my problem easily :(
>>>>
>>>> I will start diving into the Lucene source code, but I was hoping I just
>>>> did something wrong. .
>>>>
>>>> Any hints are appreciated!
>>>>
>>>> -Rob
>>>>
>>>>
>>>> On Thu, Nov 5, 2015 at 2:52 PM, will <wmartinusa@gmail.com> wrote:
>>>>
>>>> Hi Rob:
>>>>
>>>>> Do you understand how deletes work and how an index is compacted?
>>>>>
>>>>> There's some configuration/runtime activities you don't mention.... And
>>>>> you make testing process sound like a mirror of production? (Including
>>>>> configuration?)
>>>>>
>>>>>
>>>>> -will
>>>>>
>>>>>
>>>>> On 11/5/15 7:33 AM, Rob Audenaerde wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>>> I'm currently investigating an issue we have with our index. It keeps
>>>>>> getting bigger, and I don't het why.
>>>>>>
>>>>>> Here is our use case:
>>>>>>
>>>>>> We index a database of about 4 million records; spread over a few
>>>>>> hundred
>>>>>> tables. The data consists of a mix of text, dates, numbers etc. We
>>>>>> also
>>>>>> add
>>>>>> all these fields as facets.
>>>>>> Each night we delete about 90% of the data, which in testing reduces
>>>>>> the
>>>>>> index size significantly.
>>>>>> We store the data as StoredFields as well, to prevent having to access
>>>>>> the
>>>>>> database at all.
>>>>>> We use FloatAssociatedFacet fields for the facets.
>>>>>>
>>>>>>
>>>>>> In production however, it seems the index is only growing, up to
71 GB
>>>>>> for
>>>>>> these records for a month of running.
>>>>>>
>>>>>> It seems that lucene's index in just getting bigger there.
>>>>>>
>>>>>> We use lucene 5.3 on CentOS, java 8 64 bit.
>>>>>>
>>>>>> The taxonomy-index does not grow significantly.
>>>>>>
>>>>>> How should I go about checking what is wrong?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>> Jürgen Albert
>>> Geschäftsführer
>>>
>>> Data In Motion UG (haftungsbeschränkt)
>>>
>>> Kahlaische Str. 4
>>> 07745 Jena
>>>
>>> Mobil:  0157-72521634
>>> E-Mail: j.albert@datainmotion.de
>>> Web: www.datainmotion.de
>>>
>>> XING:   https://www.xing.com/profile/Juergen_Albert5
>>>
>>> Rechtliches
>>>
>>> Jena HBR 507027
>>> USt-IdNr: DE274553639
>>> St.Nr.: 162/107/04586
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>
> --
> Jürgen Albert
> Geschäftsführer
>
> Data In Motion UG (haftungsbeschränkt)
>
> Kahlaische Str. 4
> 07745 Jena
>
> Mobil:  0157-72521634
> E-Mail: j.albert@datainmotion.de
> Web: www.datainmotion.de
>
> XING:   https://www.xing.com/profile/Juergen_Albert5
>
> Rechtliches
>
> Jena HBR 507027
> USt-IdNr: DE274553639
> St.Nr.: 162/107/04586
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message