lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Atri Sharma (JIRA)" <>
Subject [jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
Date Wed, 08 May 2019 10:20:00 GMT


Atri Sharma commented on LUCENE-8757:

[~simonw] The reason the sort was added was to have a consistency guarantee from the slicing
algorithm i.e. two queries with the exact same distribution of segments should get the same
number of slices, irrespective of the order in which the segments are traversed by the method.
Consider a distribution of 8 segments where 6 segments have 10,000 documents each, and two
segments have 130,000 documents each. For the below order of traversal of segments (each value
represents the maxDoc of the segment):

{10_000, 130_000, 10_000, 10_000, 10_000, 10_000, 10_000, 130_000).

The slicing algorithm will create one slice consisting of all segments (since the last segment's
addition is what causes the maxDocs limit to be breached).

If the segments were sorted, the order would be:

{130_000, 130_000, 10_000, 10_000, 10_000, 10_000, 10_000, 10_000}


This would lead to two slices being created.


bq. also want to suggest to beef up testing a bit

Thanks, added the test. Will raise another iteration post conclusion on above discussion.


> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>                 Key: LUCENE-8757
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Priority: Major
>         Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
> The current segments to threads allocation algorithm always allocates one thread per
segment. This is detrimental to performance in case of skew in segment sizes since small segments
also get their dedicated thread. This can lead to performance degradation due to context switching
> A better algorithm which is cognizant of size skew would have better performance for
realistic scenarios

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message