lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xie, Eileen" <eileen.y....@Clarivate.com.INVALID>
Subject Force-merge performance degrading after upgrade to Lucene 8.0
Date Tue, 02 Mar 2021 10:15:54 GMT
Hi!

After upgrading ES cluster from 6.2 to 7.9 version, we find that force merge operation will
take long time, about double of previous latency.
Based on our investigation, we found the follows is main cause of the force-merge performance
decrease:
* From Lucene 8.0, NormsProducer is added as input parameter to function mergeTerms in org.apache.lucene.index.SegmentMerger.java.


<< Cause analysis

>From Lucene 8.0, We find that NormsProducer is added as input parameter to function mergeTerms
in org.apache.lucene.index.SegmentMerger.java.
The function mergeTerms is used to create .tim, .tip, .doc, .pos, .pay for each term.
This change is related to merge operation of norms setting of fields.

< merge() function before Lucene 8.0
mergeTerms(segmentWriteState);

< merge() function of lucene 8.0
try (NormsProducer norms = mergeState.mergeFieldInfos.hasNorms()
    ? codec.normsFormat().normsProducer(segmentReadState)
    : null) {
  NormsProducer normsMergeInstance = null;
  if (norms != null) {
    // Use the merge instance in order to reuse the same IndexInput for all terms
    normsMergeInstance = norms.getMergeInstance();
  }
  mergeTerms(segmentWriteState, normsMergeInstance); }


<< Test cases and result

In order to validate that above analysis is the main cause of force-merge performance decrease,
we design some test cases.

< Test environment
  *   ES cluster: 3 master nodes /1 client node /3 data nodes with i3.2xlarge
  *   Data: 13216068 docs
  *   Index: 3 primary, 0 replica

< Test steps
  1.  modify merge policy setting & norms setting in ES mapping file.
  2.  load data into ES cluster && record running duration
  3.  run index_name/_flush
  4.  run _cat segments & save output
  5.  run _forcemerge
  6.  run _cat segments & save output

< Test result

No. | ES version | Lucene version | omit norms | force merge time
-----------------------------------------------------------------
1.1 | 6.8.13 | 7.7.2 | no | 13 min
1.2 | 6.8.13 | 7.7.2 | omit norms for all text, keyword fields | 14 min
2.1 | 7.9.1 | 8.6.2 | no | 31 min
2.2 | 7.9.1 | 8.6.2 | omit norms for all text, keyword fields | 13 min


<< My question is:

  1.  Why will this Norms related change cause obviously force-merge performance decrease?
  2.  Is there any way to resolve it and improve force-merge performance for Lucene 8.0+?


Look forward your answer and thanks a lot for your help.
Eileen Xie

Confidentiality note: This e-mail may contain confidential information from Clarivate. If
you are not the intended recipient, be aware that any disclosure, copying, distribution or
use of the contents of this e-mail is strictly prohibited. If you have received this e-mail
in error, please delete this e-mail and notify the sender immediately.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message