From "Atri Sharma (JIRA)" <>
Subject [jira] [Commented] (LUCENE-8675) Divide Segment Search Amongst Multiple Threads
Date Mon, 22 Apr 2019 17:23:00 GMT


Atri Sharma commented on LUCENE-8675:

Repeating the earlier results in a more human readable form

||Task ('Wildcard', None)||P50 Base 9.993697||P50 Cmp 11.906981||Pct 19.1449070349||P90 Base
14.431318||P90 Cmp 13.953923||Pct -3.3080485095||
||Task ('HighTermDayOfYearSort', 'DayOfYear')||P50 Base 39.556908||P50 Cmp 44.389095||Pct
12.2157854198||P90 Base 62.421873||P90 Cmp 49.214184||Pct -21.1587515165||
||Task ('AndHighHigh', None)||P50 Base 3.814074||P50 Cmp 2.459326||Pct -35.5197093711||P90
Base 5.045984||P90 Cmp 7.932029||Pct 57.1948900353||
||Task ('OrHighHigh', None)||P50 Base 9.586193||P50 Cmp 5.846643||Pct -39.0097507947||P90
Base 14.978843||P90 Cmp 7.078967||Pct -52.7402283341||
||Task ('MedPhrase', None)||P50 Base 3.210464||P50 Cmp 2.276356||Pct -29.0957319565||P90 Base
4.217049||P90 Cmp 3.852337||Pct -8.64851226533||
||Task ('LowSpanNear', None)||P50 Base 11.247447||P50 Cmp 4.986828||Pct -55.6625783611||P90
Base 16.095342||P90 Cmp 6.121194||Pct -61.9691585305||
||Task ('Fuzzy2', None)||P50 Base 23.636902||P50 Cmp 20.959304||Pct -11.3280412128||P90 Base
112.5086||P90 Cmp 105.188025||Pct -6.50668037821||
||Task ('OrNotHighHigh', None)||P50 Base 4.225917||P50 Cmp 2.62127||Pct -37.9715692476||P90
Base 6.11225||P90 Cmp 8.525249||Pct 39.4780809031||
||Task ('OrHighNotLow', None)||P50 Base 4.015982||P50 Cmp 2.250697||Pct -43.956496817||P90
Base 10.636566||P90 Cmp 3.134868||Pct -70.5274427856||
||Task ('BrowseMonthSSDVFacets', None)||P50 Base 66.920633||P50 Cmp 66.986841||Pct 0.0989351072038||P90
Base 67.230757||P90 Cmp 76.011531||Pct 13.0606502021||
||Task ('Fuzzy1', None)||P50 Base 14.779783||P50 Cmp 12.559705||Pct -15.0210459788||P90 Base
46.329521||P90 Cmp 218.272906||Pct 371.131367838||
||Task ('HighSloppyPhrase', None)||P50 Base 21.362967||P50 Cmp 10.563982||Pct -50.5500242546||P90
Base 33.009649||P90 Cmp 15.74507||Pct -52.3016133858||
||Task ('OrNotHighMed', None)||P50 Base 2.032775||P50 Cmp 1.584332||Pct -22.0606314029||P90
Base 2.529475||P90 Cmp 2.044107||Pct -19.1884877297||
||Task ('LowPhrase', None)||P50 Base 4.937747||P50 Cmp 2.8876||Pct -41.5198875115||P90 Base
6.910574||P90 Cmp 5.159077||Pct -25.345173932||
||Task ('AndHighLow', None)||P50 Base 1.097696||P50 Cmp 1.416176||Pct 29.0134973617||P90 Base
3.426081||P90 Cmp 13.987273||Pct 308.258678064||
||Task ('LowTerm', None)||P50 Base 0.787595||P50 Cmp 1.038949||Pct 31.9141182968||P90 Base
1.12006||P90 Cmp 39.639455||Pct 3439.04746174||
||Task ('BrowseDayOfYearSSDVFacets', None)||P50 Base 80.006624||P50 Cmp 80.215023||Pct 0.260477182489||P90
Base 80.610476||P90 Cmp 81.187614||Pct 0.71595905227||
||Task ('Prefix3', None)||P50 Base 3.347358||P50 Cmp 3.219213||Pct -3.82824305019||P90 Base
6.716371||P90 Cmp 5.21174||Pct -22.4024402464||
||Task ('HighTermMonthSort', 'Month')||P50 Base 20.684075||P50 Cmp 19.601521||Pct -5.23375592092||P90
Base 21.341383||P90 Cmp 20.092673||Pct -5.85112033274||
||Task ('HighTerm', None)||P50 Base 2.991271||P50 Cmp 1.891199||Pct -36.7760727798||P90 Base
4.058212||P90 Cmp 2.320309||Pct -42.8243522024||
||Task Respell||P50 Base 17.33154||P50 Cmp 17.397468||Pct 0.38039320222||P90 Base 99.071728||P90
Cmp 66.75552||Pct -32.6190010535||
||Task ('MedTerm', None)||P50 Base 3.011125||P50 Cmp 1.793175||Pct -40.4483374154||P90 Base
4.206761||P90 Cmp 2.392798||Pct -43.1201820118||
||Task ('MedSloppyPhrase', None)||P50 Base 5.896878||P50 Cmp 3.304889||Pct -43.9552759952||P90
Base 8.044708||P90 Cmp 4.881775||Pct -39.316939782||
||Task ('HighSpanNear', None)||P50 Base 20.981466||P50 Cmp 9.533211||Pct -54.5636563241||P90
Base 28.98951||P90 Cmp 11.087743||Pct -61.7525684291||
||Task ('LowSloppyPhrase', None)||P50 Base 12.841091||P50 Cmp 6.075233||Pct -52.6891211969||P90
Base 18.539534||P90 Cmp 6.825001||Pct -63.1867715769||
||Task ('OrHighNotHigh', None)||P50 Base 11.822146||P50 Cmp 6.645646||Pct -43.786466518||P90
Base 17.02398||P90 Cmp 7.935497||Pct -53.3863585366||
||Task ('OrNotHighLow', None)||P50 Base 0.782455||P50 Cmp 1.06583||Pct 36.2161402253||P90
Base 1.668578||P90 Cmp 13.200645||Pct 691.131430476||
||Task ('MedSpanNear', None)||P50 Base 3.161032||P50 Cmp 2.154472||Pct -31.8427652741||P90
Base 5.386012||P90 Cmp 5.665401||Pct 5.18730741781||
||Task ('BrowseDateTaxoFacets', None)||P50 Base 444.971146||P50 Cmp 444.674024||Pct -0.066773318376||P90
Base 447.81169||P90 Cmp 445.950713||Pct -0.415571330887||
||Task ('HighPhrase', None)||P50 Base 7.464241||P50 Cmp 4.644244||Pct -37.7800904338||P90
Base 25.153245||P90 Cmp 7.548758||Pct -69.9889298578||
||Task ('OrHighLow', None)||P50 Base 6.344855||P50 Cmp 3.590218||Pct -43.4152868742||P90 Base
8.425453||P90 Cmp 15.578677||Pct 84.9001709463||
||Task ('BrowseDayOfYearTaxoFacets', None)||P50 Base 0.16655||P50 Cmp 0.184125||Pct 10.5523866707||P90
Base 0.207908||P90 Cmp 0.224575||Pct 8.01652654059||
||Task ('IntNRQ', None)||P50 Base 24.844282||P50 Cmp 12.870238||Pct -48.196377742||P90 Base
45.815197||P90 Cmp 57.190359||Pct 24.8283598999||
||Task ('BrowseMonthTaxoFacets', None)||P50 Base 0.16488||P50 Cmp 0.170045||Pct 3.13258127123||P90
Base 0.203625||P90 Cmp 0.200508||Pct -1.53075506446||
||Task ('AndHighMed', None)||P50 Base 2.109471||P50 Cmp 1.773399||Pct -15.9315771584||P90
Base 2.458244||P90 Cmp 3.943119||Pct 60.4038899312||
||Task ('OrHighNotMed', None)||P50 Base 3.580582||P50 Cmp 3.088177||Pct -13.7520939333||P90
Base 4.196391||P90 Cmp 4.16434||Pct -0.763775348865||
||Task PKLookup||P50 Base 9.248977||P50 Cmp 9.76835||Pct 5.61546428324||P90 Base 47.86882||P90
Cmp 10.705417||Pct -77.6359287737||
||Task ('OrHighMed', None)||P50 Base 9.072955||P50 Cmp 5.552202||Pct -38.8049207783||P90 Base
20.823925||P90 Cmp 7.961727||Pct -61.7664441262||

> Divide Segment Search Amongst Multiple Threads
> ----------------------------------------------
>                 Key: LUCENE-8675
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Atri Sharma
>            Priority: Major
>         Attachments: PhraseHighFreqP50.png, PhraseHighFreqP90.png, TermHighFreqP50.png,
> Segment search is a single threaded operation today, which can be a bottleneck for large
analytical queries which index a lot of data and have complex queries which touch multiple
segments (imagine a composite query with range query and filters on top). This ticket is for
discussing the idea of splitting a single segment into multiple threads based on mutually
exclusive document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning all matching
documents (collectors not terminating early). The second phase patch will introduce staged
execution and will build on top of this patch.

