lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Need suggetion in replacing forcemerge(1) with alternative which consumes less space
Date Tue, 14 Apr 2020 07:11:14 GMT
Hi,

from what you are describing it is not clear, what you are seeing. Asking the question about
"forceMerge(1)" seems like an XY-Problem (https://en.wikipedia.org/wiki/XY_problem).

(1) forceMerge(1) should never be used, only for some very special circumstances (like indexes
that are read only and never be updated again). If you forceMerge an index its "internal structure"
gets corrupted and later merging never works again like it should. This requires you to forceMerge
it over an over.

(2) forceMerge does not solve the problem you are asking for! What you see might just be a
side effect of something else!

(3) you say: 

> Lucene Document is getting corrupted. (data is not getting updated correctly.
> Merging of different row data).

This looks like an issue in your code. Be sure to create new Documents and pass them to IndexReader.
Documents may be indexed asynchronously (depending on how ou setup everything), so it looks
like you change already created/existing documents while indexing.

> 2. when we are trying to updateDocument method for single record. It is not
> reflecting in IndexReader until the count is 8.  Once the count exceeds, than
> records are visible for IndexReader. (creating 8 segment files.) is there any
> alternative for reducing these segment file creation.

Segments are perfectly fine and required to make incremental updates work correctly. What
you say with "up to 8" does not make sense. Lucene has no mechanism of making the visibility
dependent of number of segments. The issue you are seing is more related to wrong usage of
the real-time readers. IndexReaders are point-in-time snapshorts. When you getReader on the
Writer you get a reader that does not change anymore (point-in-time snapshot). To get the
updates, you have to open a new reader. There is SearcherManager to help with that. It allows
to manage a pool of searchers/indexreaders and takes care of reopening them if underlying
index data changes.

> 3. above two issues are resolved by forcemerge(1). But it is not feasible for our
> use case , because it takes 3X memory. We are creating indexes for huge data.

Don't use forceMerge, especially not to work around some issue that comes from wrong multi-threading
code and basic misunderstanding on IndexReaders and their relationship to IndexWriters.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Jyothsna Bavisetti <jyothsna.bavisetti@oracle.com>
> Sent: Tuesday, April 14, 2020 7:56 AM
> To: java-user@lucene.apache.org
> Subject: Need suggetion in replacing forcemerge(1) with alternative which
> consumes less space
> 
> Hi,
> 
> 
> 
> 1.We Upgraded Lucene 4.6 to 8+, After upgrading we are facing issue with
> Lucene Index Creation.
> 
> We are indexing in Multi-threading environment. When we create bulk indexes
> , Lucene Document is getting corrupted. (data is not getting updated correctly.
> Merging of different row data).
> 
> 2. when we are trying to updateDocument method for single record. It is not
> reflecting in IndexReader until the count is 8.  Once the count exceeds, than
> records are visible for IndexReader. (creating 8 segment files.) is there any
> alternative for reducing these segment file creation.
> 
> 3. above two issues are resolved by forcemerge(1). But it is not feasible for our
> use case , because it takes 3X memory. We are creating indexes for huge data.
> 
> 
> 
> 4. IndexWriter Config:
> analyzer=com.datanomic.director.casemanagement.indexing.AnalyzerFactory$
> MA
> 
> ramBufferSizeMB=64.0
> 
> maxBufferedDocs=-1
> 
> mergedSegmentWarmer=null
> 
> delPolicy=com.datanomic.director.casemanagement.indexing.engines.TimedDel
> etionPolicy
> 
> commit=null
> 
> openMode=CREATE_OR_APPEND
> 
> similarity=org.apache.lucene.search.similarities.BM25Similarity
> 
> mergeScheduler=ConcurrentMergeScheduler: maxThreadCount=-1,
> maxMergeCount=-1, ioThrottle=true
> 
> codec=Lucene80
> 
> infoStream=org.apache.lucene.util.InfoStream$NoOutput
> 
> mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10,
> maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0,
> floorSegmentMB=2.0, forceMergeDeletesPctAllowed=10.0,
> segmentsPerTier=10.0, maxCFSSegmentSizeMB=8.796093022207999E12,
> noCFSRatio=0.1, deletesPctAllowed=33.0
> 
> indexerThreadPool=org.apache.lucene.index.DocumentsWriterPerThreadPool@
> 24348e05
> 
> readerPooling=true
> 
> perThreadHardLimitMB=1945
> 
> useCompoundFile=false
> 
> commitOnClose=true
> 
> indexSort=null
> 
> checkPendingFlushOnUpdate=true
> 
> softDeletesField=null
> 
> readerAttributes={}
> 
> writer=org.apache.lucene.index.IndexWriter@23a84a99
> 
> 
> 
> Please suggest some ideas alternate of forceMerge, dealing with
> indexwriter.commit for multithreading, committing  data while updating single
> record.
> 
> 
> 
> 
> 
> Thanks,
> 
> Jyothsna
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message