lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wangzhijiang999" <wangzhijiang...@aliyun.com>
Subject 答复:RE: RE: About lucene memory consumption
Date Tue, 01 Jul 2014 07:17:04 GMT
My application also meet this problem last year and I researched on the code and found the
reason. 
The whole process is as follow:
1. When using NRTCachingDirectory, it will use RAMDirectory as cache and MMapDirectory as
delegate. The new segment will be created in the process of flush or merge. And the NRTCachingDirectory
use the parameters of maxMergeSizeBytes and maxCachedBytes to decide to create the new segment
in cache(in memory) or in delegate(in disk).
2.  When flush to create new segment,   it will compare the context.fllushinfo.estimatedSegmentSize of
new segment with the above parameter. If the size of new segment is small, then it will be
created in RAMDirectory, otherwise in MMapDirectory.
3. When merge to create new segment, it will compare the context.mergeInfo.estimatedMergeBytes
of  new segment with the above parameter. And if the size of new segment is small, it will
be created in cache, otherwise in delegate.
4.  But when the new segment is compound index file(cfs) no matter during flush or merge,
it will use IOContext.DEFAULT for that segment, and the estimatedMergeBytes ,estimatedSegmentSize 
are both null for IOContext.DEFAULT, resulting in creating the new compund segment file always
in cache no matter how big it really is. This is the core issue. 
 
Then I will explain the mechanism of releasing the segment in cache.
1.  Normally, in the process of commit, the sync operation will flush the new created segment
files to the disk, and delete them from the cache. But if the merging process is running
during the sync, so the new created segment by merge will not be sync to disk in this commit,
and the new merged compound segment file will be created in cache as described above.
2.  If using NRT feature, the IndexSearcher will get segmentReader from the IndexWriter by
getReader method. And theire is a ReaderPool inside the IndexWriter. For the new segment,
it will first fetch from the cache of NRTCachingDirectory, if the new segment is not in the
cache(created directly in the disk or commit to disk releasing from the cache), then fetch
it from the delegate. The new fetched segment will be put in the ReaderPool in the IndexWriter.
As described above, the new segment created by merge is in the cache now, and when it is fetched
by IndexWriter, it will be referenced by the ReaderPool of IndexWriter. In the process of
next commit, this new segment will be sync to disk and released from the cache, but it is
still referenced by the ReaderPool. And you will see the IndexSearcher reference a lot of
RAMFile which are already in the disk. When these RAMFil can be dropped?  When these segments
join the new merging process to create new segment, then these old segments will be released
from the ReaderPool of the IndexWriter completely.
 
I modified the lucene souce code to solve this problem in the CompoundFileWriter class.
out = new DirectCFSIndexOutput(getOutput(), entry, false);  //original
out = new DirectCFSIndexOutput(getOutput(context), entry, false); //modified

IndexOutput createOutput(String name, IOContext context) throws IOException {  ensureOpen();  boolean
success = false;  boolean outputLocked = false;  try {   assert name != null : "name
must not be null";   if (entries.containsKey(name)) {    throw new IllegalArgumentException("File
" + name + " already exists");   }   final FileEntry entry = new FileEntry();   entry.file
= name;   entries.put(name, entry);   final String id = IndexFileNames.stripSegmentName(name);   assert
!seenIDs.contains(id) : "file=\"" + name + "\" maps to id=\"" + id + "\", which was already
written";   seenIDs.add(id);   final DirectCFSIndexOutput out;
   if ((outputLocked = outputTaken.compareAndSet(false, true))) {    //out = new DirectCFSIndexOutput(getOutput(),
entry, false);     out = new DirectCFSIndexOutput(getOutput(context), entry, false);   }
else {    entry.dir = this.directory;    if (directory.fileExists(name)) {     throw
new IllegalArgumentException("File " + name + " already exists");    }    out = new
DirectCFSIndexOutput(directory.createOutput(name, context), entry, true);   }   success
= true;   return out;  } finally {   if (!success) {    entries.remove(name);    if
(outputLocked) { // release the output lock if not successful     assert outputTaken.get();     releaseOutputLock();    }   }  } }
 private synchronized IndexOutput getOutput(IOContext context) throws IOException {  if
(dataOut == null) {   boolean success = false;   try {    dataOut = directory.createOutput(dataFileName,
context);    CodecUtil.writeHeader(dataOut, DATA_CODEC, VERSION_CURRENT);    success
= true;   } finally {    if (!success) {     IOUtils.closeWhileHandlingException(dataOut);    }   }  }  return
dataOut; }
 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message