lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Audenaerde <rob.audenae...@gmail.com>
Subject RE: debugging growing index size
Date Sat, 14 Nov 2015 09:59:01 GMT
Thank you all,

I will further fix and investigate!
On Nov 14, 2015 10:00, "Uwe Schindler" <uwe@thetaphi.de> wrote:

> I agree. On Linux it is impossible that MMapDirectory is the reason! Only
> on windows you cannot delete still open/mapped files.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Michael McCandless [mailto:lucene@mikemccandless.com]
> > Sent: Friday, November 13, 2015 8:30 PM
> > To: Lucene Users <java-user@lucene.apache.org>
> > Subject: Re: debugging growing index size
> >
> > So with MMapDir at defaults (unmap is enabled) you see old files, with
> > no open file handles as reported by lsof, still existing in your index
> > directory, taking lots of space.
> >
> > But with NIOFSDirectory the issue doesn't happen?  Are you sure?
> >
> > I'll look at the 6.6 GB infoStream to see what it says about the ref
> counts.
> >
> > Did you fix the issue in your app where you're not closing all opened
> > NRT readers?
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Fri, Nov 13, 2015 at 12:22 PM, Rob Audenaerde
> > <rob.audenaerde@gmail.com> wrote:
> > > I haven't disabled unmapping, and I am running out-of-the-box
> > > FSDirectory.open(). As I can see it tries to pick MMap.  For the test I
> > > explicitly constructed a NIOFSDIrectoryReader
> > >
> > > OS is (from the top of my head)  CentOS 6.x, Java 1.8.0u33. I can check
> > > later for more details.
> > > On Nov 13, 2015 18:07, "Uwe Schindler" <uwe@thetaphi.de> wrote:
> > >
> > >> Hi,
> > >>
> > >> Lucene has the workaround, so it should not happen, UNLESS you
> > explicitly
> > >> disable the hack using MMapDirectory#setEnableUnmap(false).
> > >>
> > >> Uwe
> > >>
> > >> -----
> > >> Uwe Schindler
> > >> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >> http://www.thetaphi.de
> > >> eMail: uwe@thetaphi.de
> > >>
> > >> > -----Original Message-----
> > >> > From: will martin [mailto:wmartinusa@gmail.com]
> > >> > Sent: Friday, November 13, 2015 6:04 PM
> > >> > To: java-user@lucene.apache.org
> > >> > Subject: Re: debugging growing index size
> > >> >
> > >> > Hi Rob:
> > >> >
> > >> >
> > >> > Doesn’t this look like known SE issue JDK-4724038 and discussed
by
> > Peter
> > >> > Levart and Uwe Schindler on a lucene-dev thread 9/9/2015?
> > >> >
> > >> > MappedByteBuffer …. what OS are you on Rob? What JVM?
> > >> >
> > >> > http://bugs.java.com/view_bug.do?bug_id=4724038
> > >> >
> > >> > http://mail-archives.apache.org/mod_mbox/lucene-
> > >> > dev/201509.mbox/%3C55F0461A.2070005@gmail.com%3E
> > >> >
> > >> > hth
> > >> > -will
> > >> >
> > >> >
> > >> >
> > >> > > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde
> > >> > <rob.audenaerde@gmail.com> wrote:
> > >> > >
> > >> > > I'm currently running using NIOFS. It seems to prevent the issue
> from
> > >> > > appearing.
> > >> > >
> > >> > > This is a second run (with applied deletes etc)
> > >> > >
> > >> > > raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
> > >> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09
> _y_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12
> > _xod_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17
> > _110e_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19
> > _12r5_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13
> > _y0s_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17
> > _z20_Lucene50_0.dvd
> > >> > >
> > >> > > raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
> > >> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17
> > _z20_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13
> > _y0s_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19
> > _12r5_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17
> > _110e_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12
> > _xod_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09
> _y_Lucene50_0.dvd
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
> > >> > > lucene@mikemccandless.com> wrote:
> > >> > >
> > >> > >> Hi Rob,
> > >> > >>
> > >> > >> A couple more things:
> > >> > >>
> > >> > >> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
> > >> > >>
> > >> > >> Also, can you try your test using NIOFSDirectory instead?
> Curious if
> > >> > >> that changes things...
> > >> > >>
> > >> > >> Mike McCandless
> > >> > >>
> > >> > >> http://blog.mikemccandless.com
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
> > >> > >> <rob.audenaerde@gmail.com> wrote:
> > >> > >>> Curious indeed!
> > >> > >>>
> > >> > >>> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS
and
> > recreate
> > >> > the
> > >> > >>> logs. Will get back with them in a day hopefully.
> > >> > >>>
> > >> > >>> Thanks for the extra logging!
> > >> > >>>
> > >> > >>> -Rob
> > >> > >>>
> > >> > >>> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless
<
> > >> > >>> lucene@mikemccandless.com> wrote:
> > >> > >>>
> > >> > >>>> Hmm, curious.
> > >> > >>>>
> > >> > >>>> I looked at the [large] infoStream output and I see
segment
> _3ou7
> > >> > >>>> present on init of IW, a few getReader calls referencing
it,
> then a
> > >> > >>>> forceMerge that indeed merges it away, yet I do NOT
see IW
> > >> > attempting
> > >> > >>>> deletion of its files.
> > >> > >>>>
> > >> > >>>> And indeed I see plenty (too many: many times per
second?) of
> > >> > commits
> > >> > >>>> after that, so the index itself is no longer referencing
_3ou7.
> > >> > >>>>
> > >> > >>>> If you are failing to close all NRT readers then
I would expect
> > >> _3ou7
> > >> > >>>> to be in the lsof output, but it's not.
> > >> > >>>>
> > >> > >>>> The NRT readers close method has logic that notifies
> IndexWriter
> > >> when
> > >> > >>>> it's done "needing" the files, to emulate "delete
on last
> close"
> > >> > >>>> semantics for filesystems like HDFS that don't do
that ... it's
> > >> > >>>> possible something is wrong here.
> > >> > >>>>
> > >> > >>>> Can you set the (public, static) boolean
> > >> > >>>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and
then re-
> > >> > generate this
> > >> > >>>> log?  This causes IW to log the ref count of each
file it's
> tracking
> > >> > >>>> ...
> > >> > >>>>
> > >> > >>>> I'll also add a bit more verbosity to IW when NRT
readers are
> > opened
> > >> > >>>> and close, for 5.4.0.
> > >> > >>>>
> > >> > >>>> Mike McCandless
> > >> > >>>>
> > >> > >>>> http://blog.mikemccandless.com
> > >> > >>>>
> > >> > >>>>
> > >> > >>>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
> > >> > >>>> <rob.audenaerde@gmail.com> wrote:
> > >> > >>>>> Hi all,
> > >> > >>>>>
> > >> > >>>>> I'm still debugging the growing-index size. I
think closing
> index
> > >> > >> readers
> > >> > >>>>> might help (work in progress), but I can't really
see them
> holding
> > >> on
> > >> > >> to
> > >> > >>>>> files (at least, using lsof ). Restarting the
application
> sheds
> > >> some
> > >> > >>>> light,
> > >> > >>>>> I see logging on files that are no longer referenced.
> > >> > >>>>>
> > >> > >>>>> What I see is that there are files in the index-directory,
> that
> > >> seem
> > >> > >> to
> > >> > >>>>> longer referenced..
> > >> > >>>>>
> > >> > >>>>> I put the output of the infoStream online, because
is it
> rather big
> > >> > >> (30MB
> > >> > >>>>> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
> > >> > >>>>>
> > >> > >>>>> Output of lsof:  (executed 'sudo lsof *' in the
index
> directory  ).
> > >> > >> This
> > >> > >>>> is
> > >> > >>>>> on an CentOS box (maybe that influences stuff
as well?)
> > >> > >>>>>
> > >> > >>>>> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF
    NODE
> > NAME
> > >> > >>>>> java    30581 apache  mem    REG  253,0 3176094924
18880508
> > >> > >>>>> _4gs5_Lucene50_0.dvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0  505758610
18880546
> > >> _4gs5.fdt
> > >> > >>>>> java    30581 apache  mem    REG  253,0  369563337
18880631
> > >> > >>>>> _4gs5_Lucene50_0.tim
> > >> > >>>>> java    30581 apache  mem    REG  253,0  176344058
18880623
> > >> > >>>>> _4gs5_Lucene50_0.pos
> > >> > >>>>> java    30581 apache  mem    REG  253,0  378055201
18880606
> > >> > >>>>> _4gs5_Lucene50_0.doc
> > >> > >>>>> java    30581 apache  mem    REG  253,0  372579599
18880400
> > >> > >>>>> _4i5a_Lucene50_0.dvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0   82017447
18880748
> > >> _4g37.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   85376507
18880721
> > >> _4fb3.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0  363493917
18880533
> > >> > >>>>> _4ct1_Lucene50_0.dvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0    9421892
18880806
> > >> _4gjc.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   76877461
18880553
> > >> _4ct1.fdt
> > >> > >>>>> java    30581 apache  mem    REG  253,0   46271330
18880661
> > >> > >>>>> _4ct1_Lucene50_0.tim
> > >> > >>>>> java    30581 apache  mem    REG  253,0   26911387
18880653
> > >> > >>>>> _4ct1_Lucene50_0.pos
> > >> > >>>>> java    30581 apache  mem    REG  253,0   54678249
18880568
> > >> > >>>>> _4ct1_Lucene50_0.doc
> > >> > >>>>> java    30581 apache  mem    REG  253,0   76556587
18880328
> > >> _4i5a.fdt
> > >> > >>>>> java    30581 apache  mem    REG  253,0   45032159
18880389
> > >> > >>>>> _4i5a_Lucene50_0.tim
> > >> > >>>>> java    30581 apache  mem    REG  253,0   26486772
18880388
> > >> > >>>>> _4i5a_Lucene50_0.pos
> > >> > >>>>> java    30581 apache  mem    REG  253,0   55411002
18880362
> > >> > >>>>> _4i5a_Lucene50_0.doc
> > >> > >>>>> java    30581 apache  mem    REG  253,0   70484185
18880340
> > >> _4hkn.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   10873921
18880324
> > >> _4gpz.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   17230506
18880524
> > >> _4i11.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    6706969
18880575
> > >> _4i0t.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   15135578
18880624
> > >> _4i0i.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   15368310
18880717
> > >> _4hzp.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    5146140
18880583
> > >> _4hze.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    2917380
18880411
> > >> _4gs5.nvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0    6871469
18880732
> > >> _4hod.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    2860341
18880495
> > >> _4i84.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     835726
18880660
> > >> _4i7z.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    1005595
18880648
> > >> _4i7w.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    5639672
18880401
> > >> _4i4o.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    4388371
18880440
> > >> _4i4a.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    1151845
18880512
> > >> _4i7v.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     941773
18880613
> > >> _4i7x.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     984023
18880588
> > >> _4i7o.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    1790005
18880619
> > >> _4i7y.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     466371
18880515
> > >> _4ct1.nvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0     723280
18880573
> > >> _4i7q.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     806289
18880517
> > >> _4i7h.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     
17362 18880520
> > >> _4i9s.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     698362
18880531
> > >> _4i9r.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     483215
18880406
> > >> _4i5a.nvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0     
14110 18880416
> > >> _4i9v.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     
 6121 18880412
> > >> _4i9t.cfs
> > >> > >>>>> java    30581 apache   30wW  REG  253,0     
    0 18877901
> > >> write.lock
> > >> > >>>>>
> > >> > >>>>> Output of some of the biggest files in the index
directory:
> > >> > >>>>>
> > >> > >>>>> -rw-r--r--. 1 apache apache  358684577 Nov 11
08:04 _4fjn.cfs
> > >> > >>>>> -rw-r--r--. 1 apache apache  363493917 Nov 11
07:54
> > >> > >> _4ct1_Lucene50_0.dvd
> > >> > >>>>> -rw-r--r--. 1 apache apache  369563337 Nov 11
08:06
> > >> > >> _4gs5_Lucene50_0.tim
> > >> > >>>>> -rw-r--r--. 1 apache apache  372579599 Nov 11
08:09
> > >> > >> _4i5a_Lucene50_0.dvd
> > >> > >>>>> -rw-r--r--. 1 apache apache  378055201 Nov 11
08:06
> > >> > >> _4gs5_Lucene50_0.doc
> > >> > >>>>> -rw-r--r--. 1 apache apache  427401813 Nov 10
08:14 _3ou7.cfs
> > >> > >>>>> -rw-r--r--. 1 apache apache  505758610 Nov 11
08:04 _4gs5.fdt
> > >> > >>>>> -rw-r--r--. 1 apache apache 1107391579 Nov 10
07:55
> > >> > >> _3k3a_Lucene50_0.dvd
> > >> > >>>>> -rw-r--r--. 1 apache apache 3176094924 Nov 11
08:10
> > >> > >> _4gs5_Lucene50_0.dvd
> > >> > >>>>>
> > >> > >>>>> Note that the 3ou7 and 3k3a segments no longer
appear to be in
> > use?
> > >> > >>>>
> > >> > >>>>
> > >> ---------------------------------------------------------------------
> > >> > >>>> To unsubscribe, e-mail: java-user-
> > unsubscribe@lucene.apache.org
> > >> > >>>> For additional commands, e-mail: java-user-
> > help@lucene.apache.org
> > >> > >>>>
> > >> > >>>>
> > >> > >>
> > >> > >>
> ---------------------------------------------------------------------
> > >> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> > >> For additional commands, e-mail:
> java-user-help@lucene.apache.org
> > >> > >>
> > >> > >>
> > >> >
> > >> >
> > >> >
> ---------------------------------------------------------------------
> > >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message