lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: debugging growing index size
Date Thu, 12 Nov 2015 10:34:11 GMT
Hmm, curious.

I looked at the [large] infoStream output and I see segment _3ou7
present on init of IW, a few getReader calls referencing it, then a
forceMerge that indeed merges it away, yet I do NOT see IW attempting
deletion of its files.

And indeed I see plenty (too many: many times per second?) of commits
after that, so the index itself is no longer referencing _3ou7.

If you are failing to close all NRT readers then I would expect _3ou7
to be in the lsof output, but it's not.

The NRT readers close method has logic that notifies IndexWriter when
it's done "needing" the files, to emulate "delete on last close"
semantics for filesystems like HDFS that don't do that ... it's
possible something is wrong here.

Can you set the (public, static) boolean
IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate this
log?  This causes IW to log the ref count of each file it's tracking
...

I'll also add a bit more verbosity to IW when NRT readers are opened
and close, for 5.4.0.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
<rob.audenaerde@gmail.com> wrote:
> Hi all,
>
> I'm still debugging the growing-index size. I think closing index readers
> might help (work in progress), but I can't really see them holding on to
> files (at least, using lsof ). Restarting the application sheds some light,
> I see logging on files that are no longer referenced.
>
> What I see is that there are files in the index-directory, that seem to
> longer referenced..
>
> I put the output of the infoStream online, because is it rather big (30MB
> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
>
> Output of lsof:  (executed 'sudo lsof *' in the index directory  ). This is
> on an CentOS box (maybe that influences stuff as well?)
>
> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
> java    30581 apache  mem    REG  253,0 3176094924 18880508
> _4gs5_Lucene50_0.dvd
> java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
> java    30581 apache  mem    REG  253,0  369563337 18880631
> _4gs5_Lucene50_0.tim
> java    30581 apache  mem    REG  253,0  176344058 18880623
> _4gs5_Lucene50_0.pos
> java    30581 apache  mem    REG  253,0  378055201 18880606
> _4gs5_Lucene50_0.doc
> java    30581 apache  mem    REG  253,0  372579599 18880400
> _4i5a_Lucene50_0.dvd
> java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
> java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
> java    30581 apache  mem    REG  253,0  363493917 18880533
> _4ct1_Lucene50_0.dvd
> java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
> java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
> java    30581 apache  mem    REG  253,0   46271330 18880661
> _4ct1_Lucene50_0.tim
> java    30581 apache  mem    REG  253,0   26911387 18880653
> _4ct1_Lucene50_0.pos
> java    30581 apache  mem    REG  253,0   54678249 18880568
> _4ct1_Lucene50_0.doc
> java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
> java    30581 apache  mem    REG  253,0   45032159 18880389
> _4i5a_Lucene50_0.tim
> java    30581 apache  mem    REG  253,0   26486772 18880388
> _4i5a_Lucene50_0.pos
> java    30581 apache  mem    REG  253,0   55411002 18880362
> _4i5a_Lucene50_0.doc
> java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
> java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
> java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
> java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
> java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
> java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
> java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
> java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
> java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
> java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
> java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
> java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
> java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
> java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
> java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
> java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
> java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
> java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
> java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
> java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
> java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
> java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
> java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
> java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
> java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
> java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
> java    30581 apache   30wW  REG  253,0          0 18877901 write.lock
>
> Output of some of the biggest files in the index directory:
>
> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54 _4ct1_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06 _4gs5_Lucene50_0.tim
> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09 _4i5a_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06 _4gs5_Lucene50_0.doc
> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55 _3k3a_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10 _4gs5_Lucene50_0.dvd
>
> Note that the 3ou7 and 3k3a segments no longer appear to be in use?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message