lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-6554) Speed up overseer operations for collections with stateFormat > 1
Date Mon, 01 Dec 2014 16:03:13 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229939#comment-14229939
] 

Shalin Shekhar Mangar commented on SOLR-6554:
---------------------------------------------

Actually, the improvements in Overseer for stateFormat=1 (the default case) is much better
than I expected. After the refactorings, the amILeader calls are very infrequent and the speed
up is about 40%:

{code}
Overseer queue size: 20000 state requests

stateFormat = 1, With refactoring (trunk)
=========================================

216071 T12 oasc.OverseerTest.testPerformance Overseer loop finished processing: 
216072 T12 oasc.OverseerTest.printTimingStats 	 totalTime: 201411.465265
216072 T12 oasc.OverseerTest.printTimingStats 	 avgRequestsPerMinute: 0.004964922311489345
216073 T12 oasc.OverseerTest.printTimingStats 	 5minRateRequestsPerMinute: 0.0
216073 T12 oasc.OverseerTest.printTimingStats 	 15minRateRequestsPerMinute: 0.0
216073 T12 oasc.OverseerTest.printTimingStats 	 avgTimePerRequest: 201411.465265
216073 T12 oasc.OverseerTest.printTimingStats 	 medianRequestTime: 201411.465265
216073 T12 oasc.OverseerTest.printTimingStats 	 75thPctlRequestTime: 201411.465265
216074 T12 oasc.OverseerTest.printTimingStats 	 95thPctlRequestTime: 201411.465265
216074 T12 oasc.OverseerTest.printTimingStats 	 99thPctlRequestTime: 201411.465265
216074 T12 oasc.OverseerTest.printTimingStats 	 999thPctlRequestTime: 201411.465265
216075 T12 oasc.OverseerTest.testPerformance op: am_i_leader, success: 2, failure: 0
216075 T12 oasc.OverseerTest.printTimingStats 	 totalTime: 9.377281
216075 T12 oasc.OverseerTest.printTimingStats 	 avgRequestsPerMinute: 0.5969575423185497
216075 T12 oasc.OverseerTest.printTimingStats 	 5minRateRequestsPerMinute: 12.529098642264385
216075 T12 oasc.OverseerTest.printTimingStats 	 15minRateRequestsPerMinute: 19.324759776433687
216075 T12 oasc.OverseerTest.printTimingStats 	 avgTimePerRequest: 4.6886405
216076 T12 oasc.OverseerTest.printTimingStats 	 medianRequestTime: 4.6886405
216076 T12 oasc.OverseerTest.printTimingStats 	 75thPctlRequestTime: 9.022041
216076 T12 oasc.OverseerTest.printTimingStats 	 95thPctlRequestTime: 9.022041
216076 T12 oasc.OverseerTest.printTimingStats 	 99thPctlRequestTime: 9.022041
216076 T12 oasc.OverseerTest.printTimingStats 	 999thPctlRequestTime: 9.022041
216077 T12 oasc.OverseerTest.testPerformance op: update_state, success: 135, failure: 0
216077 T12 oasc.OverseerTest.printTimingStats 	 totalTime: 61.333751
216077 T12 oasc.OverseerTest.printTimingStats 	 avgRequestsPerMinute: 40.31065112174398
216077 T12 oasc.OverseerTest.printTimingStats 	 5minRateRequestsPerMinute: 48.0
216078 T12 oasc.OverseerTest.printTimingStats 	 15minRateRequestsPerMinute: 48.0
216078 T12 oasc.OverseerTest.printTimingStats 	 avgTimePerRequest: 0.4543240814814815
216078 T12 oasc.OverseerTest.printTimingStats 	 medianRequestTime: 0.364217
216078 T12 oasc.OverseerTest.printTimingStats 	 75thPctlRequestTime: 0.409896
216078 T12 oasc.OverseerTest.printTimingStats 	 95thPctlRequestTime: 0.9332719999999994
216079 T12 oasc.OverseerTest.printTimingStats 	 99thPctlRequestTime: 3.576287319999995
216079 T12 oasc.OverseerTest.printTimingStats 	 999thPctlRequestTime: 3.700744
216079 T12 oasc.OverseerTest.testPerformance op: state, success: 20001, failure: 0
216081 T12 oasc.OverseerTest.printTimingStats 	 totalTime: 13344.072646
216081 T12 oasc.OverseerTest.printTimingStats 	 avgRequestsPerMinute: 5973.226142698651
216081 T12 oasc.OverseerTest.printTimingStats 	 5minRateRequestsPerMinute: 4437.949777291698
216082 T12 oasc.OverseerTest.printTimingStats 	 15minRateRequestsPerMinute: 3247.958438006491
216082 T12 oasc.OverseerTest.printTimingStats 	 avgTimePerRequest: 0.6671702737863107
216083 T12 oasc.OverseerTest.printTimingStats 	 medianRequestTime: 0.6112960000000001
216083 T12 oasc.OverseerTest.printTimingStats 	 75thPctlRequestTime: 0.65861125
216083 T12 oasc.OverseerTest.printTimingStats 	 95thPctlRequestTime: 0.9373918
216083 T12 oasc.OverseerTest.printTimingStats 	 99thPctlRequestTime: 1.179823900000002
216083 T12 oasc.OverseerTest.printTimingStats 	 999thPctlRequestTime: 6.713780613000015


stateFormat = 1, Without refactoring (branch_5x):
============================================================================================

354435 T11 oasc.OverseerTest.testPerformance Overseer loop finished processing: 
354437 T11 oasc.OverseerTest.printTimingStats 	 totalTime: 336777.887
354438 T11 oasc.OverseerTest.printTimingStats 	 avgRequestsPerMinute: 0.0029692955509913457
354438 T11 oasc.OverseerTest.printTimingStats 	 5minRateRequestsPerMinute: 0.0
354438 T11 oasc.OverseerTest.printTimingStats 	 15minRateRequestsPerMinute: 0.0
354439 T11 oasc.OverseerTest.printTimingStats 	 avgTimePerRequest: 336777.887
354439 T11 oasc.OverseerTest.printTimingStats 	 medianRequestTime: 336777.887
354439 T11 oasc.OverseerTest.printTimingStats 	 75thPctlRequestTime: 336777.887
354440 T11 oasc.OverseerTest.printTimingStats 	 95thPctlRequestTime: 336777.887
354440 T11 oasc.OverseerTest.printTimingStats 	 99thPctlRequestTime: 336777.887
354440 T11 oasc.OverseerTest.printTimingStats 	 999thPctlRequestTime: 336777.887
354441 T11 oasc.OverseerTest.testPerformance op: state, success: 20001, failure: 0
354444 T11 oasc.OverseerTest.printTimingStats 	 totalTime: 13029.408
354444 T11 oasc.OverseerTest.printTimingStats 	 avgRequestsPerMinute: 3570.0750281584515
354444 T11 oasc.OverseerTest.printTimingStats 	 5minRateRequestsPerMinute: 3169.209724490217
354445 T11 oasc.OverseerTest.printTimingStats 	 15minRateRequestsPerMinute: 2124.6849108211077
354445 T11 oasc.OverseerTest.printTimingStats 	 avgTimePerRequest: 0.6514378281085945
354445 T11 oasc.OverseerTest.printTimingStats 	 medianRequestTime: 0.59
354446 T11 oasc.OverseerTest.printTimingStats 	 75thPctlRequestTime: 0.633
354446 T11 oasc.OverseerTest.printTimingStats 	 95thPctlRequestTime: 0.8480999999999999
354446 T11 oasc.OverseerTest.printTimingStats 	 99thPctlRequestTime: 0.9995200000000004
354447 T11 oasc.OverseerTest.printTimingStats 	 999thPctlRequestTime: 1.736079000000002
354447 T11 oasc.OverseerTest.testPerformance op: update_state, success: 222, failure: 0
354448 T11 oasc.OverseerTest.printTimingStats 	 totalTime: 98.244
354448 T11 oasc.OverseerTest.printTimingStats 	 avgRequestsPerMinute: 39.622607985461286
354448 T11 oasc.OverseerTest.printTimingStats 	 5minRateRequestsPerMinute: 48.0
354448 T11 oasc.OverseerTest.printTimingStats 	 15minRateRequestsPerMinute: 48.0
354449 T11 oasc.OverseerTest.printTimingStats 	 avgTimePerRequest: 0.44254054054054054
354449 T11 oasc.OverseerTest.printTimingStats 	 medianRequestTime: 0.3835
354450 T11 oasc.OverseerTest.printTimingStats 	 75thPctlRequestTime: 0.463
354450 T11 oasc.OverseerTest.printTimingStats 	 95thPctlRequestTime: 0.7994499999999999
354450 T11 oasc.OverseerTest.printTimingStats 	 99thPctlRequestTime: 1.2152900000000026
354451 T11 oasc.OverseerTest.printTimingStats 	 999thPctlRequestTime: 2.452
354451 T11 oasc.OverseerTest.testPerformance op: am_i_leader, success: 223, failure: 0
354452 T11 oasc.OverseerTest.printTimingStats 	 totalTime: 43.33
354453 T11 oasc.OverseerTest.printTimingStats 	 avgRequestsPerMinute: 39.777330428482294
354453 T11 oasc.OverseerTest.printTimingStats 	 5minRateRequestsPerMinute: 57.7576718337744
354453 T11 oasc.OverseerTest.printTimingStats 	 15minRateRequestsPerMinute: 65.77963729636123
354453 T11 oasc.OverseerTest.printTimingStats 	 avgTimePerRequest: 0.194304932735426
354454 T11 oasc.OverseerTest.printTimingStats 	 medianRequestTime: 0.149
354454 T11 oasc.OverseerTest.printTimingStats 	 75thPctlRequestTime: 0.188
354454 T11 oasc.OverseerTest.printTimingStats 	 95thPctlRequestTime: 0.25839999999999996
354454 T11 oasc.OverseerTest.printTimingStats 	 99thPctlRequestTime: 0.47591999999999895
354455 T11 oasc.OverseerTest.printTimingStats 	 999thPctlRequestTime: 5.712
{code}

Do not compare these numbers with the last ones because this test was run on a different box.
Also trunk used jdk1.8.0_25 and branch_5x was run on jdk1.7.0_25. I'm running the other tests
and I will report back shortly.

> Speed up overseer operations for collections with stateFormat > 1
> -----------------------------------------------------------------
>
>                 Key: SOLR-6554
>                 URL: https://issues.apache.org/jira/browse/SOLR-6554
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>    Affects Versions: 5.0, Trunk
>            Reporter: Shalin Shekhar Mangar
>         Attachments: SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch,
SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch, SOLR-6554.patch, SOLR-6554.patch,
SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch
>
>
> Right now (after SOLR-5473 was committed), a node watches a collection only if stateFormat=1
or if that node hosts at least one core belonging to that collection.
> This means that a node which is the overseer operates on all collections but watches
only a few. So any read goes directly to zookeeper which slows down overseer operations.
> Let's have the overseer node watch all collections always and never remove those watches
(except when the collection itself is deleted).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message