lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jerven Tjalling Bolleman <Jerven.Bolle...@sib.swiss>
Subject Re: Static index, fastest way to do forceMerge
Date Fri, 02 Nov 2018 20:36:20 GMT
On 2018-11-02 20:52, Dawid Weiss wrote:
>> int processors = Runtime.getRuntime().availableProcessors();
>> int ConcurrentMergeScheduler cms = new ConcurrentMergeScheduler();
>> cms.setMaxMergesAndThreads(processors,processors);
> 
> See the number of threads in the CMS only matters if you have
> concurrent merges of independent segments. What you're doing
> effectively forces an eventual X -> 1 merge, which is done by a single
> thread (regardless of the max processors above).
> 
>>    38G _583u.fdt
>>    25M _583u.fdx
>>    13K _583u.fnm
>>    47G _583u_Lucene50_0.doc
>>    54G _583u_Lucene50_0.pos
>>    30G _583u_Lucene50_0.tim
>>   413M _583u_Lucene50_0.tip
>>   2.1G _583u_Lucene70_0.dvd
>>    213 _583u_Lucene70_0.dvm
> 
> Merging segments as large as this one requires not just CPU, but also
> serious I/O throughput efficiency. I assume you have fast NVMe drives
> on that machine, otherwise it'll be slow, no matter what. It's just a
> lot of bytes going back and forth.
Yup, it's now cloud so optimizing for quick index and then merge to one 
has become financially interesting. Now it's too much cpu and ram being 
idle. Nor even maxing out the disk io (about 25% of max rate).
> 
>> If we did such a max resource merge code would there be interest to 
>> have this merged?
> 
> I think so. Try to experiment locally first though and see if what you
> can find out. Hacking that code I pointed at shouldn't be too
> difficult. see what happens.
Yeah, before I left I started with an experiment to have one running 
without the
merge scheduler being involved at all.

Will try a few more experiments next week.
> 
>> Or should we maybe do something like this assuming 64 cpus
>> 
>> writer.forceMerge(64, true);
>> writer.forceMerge(32, true);
>> writer.forceMerge(16, true);
>> writer.forceMerge(8, true);
>> writer.forceMerge(4, true);
>> writer.forceMerge(2, true);
>> writer.forceMerge(1, true);
> 
> No, this doesn't make much sense. If your goal is 1 segment then you
> want to read from as many of them as once as possible and merge into a
> single segment. Doing what you did above would only bump I/O traffic a
> lot.
Thanks, I always thought so but wasn't sure anymore.

Have a nice weekend everyone!
> 
> Dawid
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

-- 
Jerven Tjalling Bolleman
SIB | Swiss Institute of Bioinformatics
CMU - 1, rue Michel Servet - 1211 Geneva 4
t: +41 22 379 58 85 - f: +41 22 379 58 58
Jerven.Bolleman@sib.swiss - http://www.sib.swiss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message