From "wangzhijiang999" <>
Subject 答复:答复:RE: RE: About lucene memory consumption
Date Tue, 01 Jul 2014 08:39:27 GMT
Hi Uwe, 
       I already created the issue in JIRA "".
Zhijiang Wang

------------------------------------------------------------------发件人:Uwe Schindler
<>发送时间:2014年7月1日(星期二) 15:47收件人:java-user
<>; wangzhijiang999 <>主 题:RE:
答复:RE: RE: About lucene memory consumptionHi Wang,would it be possible to open a JIRA
issue so we can track this?In any case, I would recommend to disable compound files if you
use NRTCachingDirectory (as a workaround).Uwe-----Uwe SchindlerH.-H.-Meier-Allee 63, D-28213
Bremenhttp://www.thetaphi.deeMail:> -----Original Message-----> From:
wangzhijiang999 []> Sent: Tuesday, July 01, 2014 9:17
AM> To: java-user> Subject: 答复:RE: RE: About lucene memory consumption> >
My application also meet this problem last year and I researched on the code> and found
the reason.> The whole process is as follow:> 1. When using NRTCachingDirectory, it
will use RAMDirectory as cache and> MMapDirectory as delegate. The new segment will be
created in the process> of flush or merge. And the NRTCachingDirectory use the parameters
of> maxMergeSizeBytes and maxCachedBytes to decide to create the new> segment in cache(in
memory) or in delegate(in disk).> 2. When flush to create new segment, it will compare
the> context.fllushinfo.estimatedSegmentSize of new segment with the above> parameter.
If the size of new segment is small, then it will be created> in RAMDirectory, otherwise
in MMapDirectory.> 3. When merge to create new segment, it will compare the> context.mergeInfo.estimatedMergeBytes
of new segment with the above> parameter. And if the size of new segment is small, it will
be created in cache,> otherwise in delegate.> 4. But when the new segment is compound
index file(cfs) no matter during> flush or merge, it will use IOContext.DEFAULT for that
segment, and the> estimatedMergeBytes ,estimatedSegmentSize are both null for> IOContext.DEFAULT,
resulting in creating the new compund segment file> always in cache no matter how big it
really is. This is the core issue.> > Then I will explain the mechanism of releasing
the segment in cache.> 1. Normally, in the process of commit, the sync operation will flush
the new> created segment files to the disk, and delete them from the cache. But if the>
merging process is running during the sync, so the new created segment by> merge will not
be sync to disk in this commit, and the new merged> compound segment file will be created
in cache as described above.> 2. If using NRT feature, the IndexSearcher will get segmentReader
from the> IndexWriter by getReader method. And theire is a ReaderPool> inside the IndexWriter.
For the new segment, it will first fetch from the cache> of NRTCachingDirectory, if the
new segment is not in the cache(created> directly in the disk or commit to disk releasing
from the cache), then fetch it> from the delegate. The new fetched segment will be put
in the ReaderPool> in the IndexWriter. As described above, the new segment created by merge>
is in the cache now, and when it is fetched by IndexWriter, it will be> referenced by the
ReaderPool of IndexWriter. In the process of next> commit, this new segment will be sync
to disk and released from the cache,> but it is still referenced by the ReaderPool. And
you will see the> IndexSearcher reference a lot of RAMFile which are already in the disk.>
When these RAMFil can be dropped? When these segments join the new> merging process to
create new segment, then these old segments will be> released from the ReaderPool of the
IndexWriter completely.> > I modified the lucene souce code to solve this problem in
the> CompoundFileWriter class.> out = new DirectCFSIndexOutput(getOutput(), entry, false);
//original out => new DirectCFSIndexOutput(getOutput(context), entry, false); //modified>
> IndexOutput createOutput(String name, IOContext context) throws> IOException { ensureOpen();
boolean success = false; boolean> outputLocked = false; try { assert name != null : "name
must not be null"; if> (entries.containsKey(name)) { throw new IllegalArgumentException("File
"> + name + " already exists"); } final FileEntry entry = new> FileEntry(); entry.file
= name; entries.put(name, entry); final String id => IndexFileNames.stripSegmentName(name);
assert !seenIDs.contains(id) :> "file=\"" + name + "\" maps to id=\"" + id + "\", which
was already> written"; seenIDs.add(id); final DirectCFSIndexOutput out;> if ((outputLocked
= outputTaken.compareAndSet(false, true))) { //out => new DirectCFSIndexOutput(getOutput(),
entry, false); out = new> DirectCFSIndexOutput(getOutput(context), entry, false); } else
{ entry.dir> =; if (directory.fileExists(name)) { throw new> IllegalArgumentException("File
" + name + " already exists"); } out = new> DirectCFSIndexOutput(directory.createOutput(name,
context), entry,> true); } success = true; return out; } finally { if (!success)> {
entries.remove(name); if (outputLocked) { // release the output lock if> not successful
assert outputTaken.get(); releaseOutputLock(); } } } }> private synchronized IndexOutput
getOutput(IOContext context) throws> IOException { if (dataOut == null) { boolean success
= false; try { dataOut> = directory.createOutput(dataFileName,> context); CodecUtil.writeHeader(dataOut,
DATA_CODEC,> VERSION_CURRENT); success = true; } finally { if (!success)> { IOUtils.closeWhileHandlingException(dataOut);
} } } return dataOut; }> 
