lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: 答复:RE: RE: About lucene memory consumption
Date Tue, 01 Jul 2014 07:47:26 GMT
Hi Wang,

would it be possible to open a JIRA issue so we can track this?
In any case, I would recommend to disable compound files if you use NRTCachingDirectory (as
a workaround).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: wangzhijiang999 [mailto:wangzhijiang999@aliyun.com]
> Sent: Tuesday, July 01, 2014 9:17 AM
> To: java-user
> Subject: 答复:RE: RE: About lucene memory consumption
> 
> My application also meet this problem last year and I researched on the code
> and found the reason.
> The whole process is as follow:
> 1. When using NRTCachingDirectory, it will use RAMDirectory as cache and
> MMapDirectory as delegate. The new segment will be created in the process
> of flush or merge. And the NRTCachingDirectory use the parameters of
> maxMergeSizeBytes and maxCachedBytes to decide to create the new
> segment in cache(in memory) or in delegate(in disk).
> 2.  When flush to create new segment,   it will compare the
> context.fllushinfo.estimatedSegmentSize of new segment with the above
> parameter. If the size of new segment is small, then it will be created
> in RAMDirectory, otherwise in MMapDirectory.
> 3. When merge to create new segment, it will compare the
> context.mergeInfo.estimatedMergeBytes of  new segment with the above
> parameter. And if the size of new segment is small, it will be created in cache,
> otherwise in delegate.
> 4.  But when the new segment is compound index file(cfs) no matter during
> flush or merge, it will use IOContext.DEFAULT for that segment, and the
> estimatedMergeBytes ,estimatedSegmentSize  are both null for
> IOContext.DEFAULT, resulting in creating the new compund segment file
> always in cache no matter how big it really is. This is the core issue.
> 
> Then I will explain the mechanism of releasing the segment in cache.
> 1.  Normally, in the process of commit, the sync operation will flush the new
> created segment files to the disk, and delete them from the cache. But if the
> merging process is running during the sync, so the new created segment by
> merge will not be sync to disk in this commit, and the new merged
> compound segment file will be created in cache as described above.
> 2.  If using NRT feature, the IndexSearcher will get segmentReader from the
> IndexWriter by getReader method. And theire is a ReaderPool
> inside the IndexWriter. For the new segment, it will first fetch from the cache
> of NRTCachingDirectory, if the new segment is not in the cache(created
> directly in the disk or commit to disk releasing from the cache), then fetch it
> from the delegate. The new fetched segment will be put in the ReaderPool
> in the IndexWriter. As described above, the new segment created by merge
> is in the cache now, and when it is fetched by IndexWriter, it will be
> referenced by the ReaderPool of IndexWriter. In the process of next
> commit, this new segment will be sync to disk and released from the cache,
> but it is still referenced by the ReaderPool. And you will see the
> IndexSearcher reference a lot of RAMFile which are already in the disk.
> When these RAMFil can be dropped?  When these segments join the new
> merging process to create new segment, then these old segments will be
> released from the ReaderPool of the IndexWriter completely.
> 
> I modified the lucene souce code to solve this problem in the
> CompoundFileWriter class.
> out = new DirectCFSIndexOutput(getOutput(), entry, false);  //original out =
> new DirectCFSIndexOutput(getOutput(context), entry, false); //modified
> 
> IndexOutput createOutput(String name, IOContext context) throws
> IOException {  ensureOpen();  boolean success = false;  boolean
> outputLocked = false;  try {   assert name != null : "name must not be null";   if
> (entries.containsKey(name)) {    throw new IllegalArgumentException("File "
> + name + " already exists");   }   final FileEntry entry = new
> FileEntry();   entry.file = name;   entries.put(name, entry);   final String id =
> IndexFileNames.stripSegmentName(name);   assert !seenIDs.contains(id) :
> "file=\"" + name + "\" maps to id=\"" + id + "\", which was already
> written";   seenIDs.add(id);   final DirectCFSIndexOutput out;
>    if ((outputLocked = outputTaken.compareAndSet(false, true))) {    //out =
> new DirectCFSIndexOutput(getOutput(), entry, false);     out = new
> DirectCFSIndexOutput(getOutput(context), entry, false);   } else {    entry.dir
> = this.directory;    if (directory.fileExists(name)) {     throw new
> IllegalArgumentException("File " + name + " already exists");    }    out = new
> DirectCFSIndexOutput(directory.createOutput(name, context), entry,
> true);   }   success = true;   return out;  } finally {   if (!success)
> {    entries.remove(name);    if (outputLocked) { // release the output lock if
> not successful     assert outputTaken.get();     releaseOutputLock();    }   }  } }
>  private synchronized IndexOutput getOutput(IOContext context) throws
> IOException {  if (dataOut == null) {   boolean success = false;   try {    dataOut
> = directory.createOutput(dataFileName,
> context);    CodecUtil.writeHeader(dataOut, DATA_CODEC,
> VERSION_CURRENT);    success = true;   } finally {    if (!success)
> {     IOUtils.closeWhileHandlingException(dataOut);    }   }  }  return dataOut; }
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message