kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From davidral...@gmail.com
Subject Re: RPC and Service difference?
Date Sat, 29 Apr 2017 17:50:42 GMT
This suggests that either tablets are unbalanced or that tablet size is asymmetric.

Few more things to check:
Can you check tablet sizes  in nodes 7 and 8? Look for tablets with sizes much bigger than
average. Also check tablets in nodes 2,3.
How big is the total and per tablet data set?
Just to confirm, how did you install kudu? Did you build from sources or did you use a pre-built
package?
Which data types are d1, m1? Why just compression and no encoding?

Sent from my iPhone

> On Apr 29, 2017, at 4:15 AM, 기준 <0ctopus13prime@gmail.com> wrote:
> 
> @David Alves
> 
> Thanks for reply me David Alves!
> 
> This is my environment.
> 
> 1. 9 nodes. 40 core, 3.3T SSD
> 2. Most common sql is 'SELECT d1, SUM(m1) FROM table WHERE d1 in
> ('some1', 'some2', 'some3', 'some4' ...)'
> 3. 'd1' is lz4 + plain encoding
> 4. I already computed statistics in impala. But i does not seems to
> have great performance, since i just filter and sum up not join
> multiple tables
> 5. Total tablet in 600. (Because We are deciding best partition number
> through performance test, so it quite many tables in Kudu server.)
> 6. Concurrent query is 40~100 qps to impala hs2.
> 7. Kudu, Impala are in same node.
> 8. Maintenance thread factor is 8. Ingestion job is running from
> outside yarn cluster, and io budget is 1g.
> During performance testing compaction jobs are running. (It's designed
> on purpose. In production, ingestion job and scan job are running
> independently ultimately)
> 9. Command 'top -d 1 -H -p $(impalad)' show impala idle most time. Cpu
> usage is below 3%.
> 10. Stress testing target table's tablet distributions are below
> 
> Target table's tablet per nodes.
> 
> node1 : 12
> node2 : 11
> node3 : 12
> node4 : 13
> node5 : 16
> node6 : 14
> node7 : 14
> node8 : 13
> node9 : 15
> 
> 11. One acceptor thread
> 12. 50 Service pool
> 13. 20 Service threads
> 
> 
> 
> 
> I find few things during testing.
> 
> 1. Impala is idle most time.
> 2. Query execution time in few node among 9 nodes are bottleneck.
> 
> SQL -> SELECT d1, SUM(m1) FROM table WHERE d1 in ('some1', 'some2', 'some3')
> 
> node1 -> 702ms
> node2 -> 4,777ms          *
> node3 -> 6,624ms          *
> node4 -> 731ms
> node5 -> 688ms
> node6 -> 910ms
> node7 -> 16,667ms        *
> node8 -> 17,960ms        *
> node9 -> 655ms
> 
> As you can see some nodes are definitely bottleneck also bottleneck
> server is changing every time after issuing single query.
> 
> 3. Execution time between `in ('some')` and `in ('some1', 'some2')`
> are quite difference.
> Single filtering have 3x performance compared to containing more than
> single filtering.
> 
> 4. I also did 'SET NUM_SCANNER_THREADS=10' but it's upperbound, not
> lowerbound. It is to set maximum scan thread number not minimum.
> And execution time after doing 'num_scanner_theads=1' and
> 'num_scanner_threads=5'  are almost same.
> Also in this situation, some nodes still lagged.
> So I think impala's scan thread is not bottleneck.
> 
> 
> 
> So i concluded ingestion job, and concurrent query workload to kudu is
> bottleneck.
> In outside yarn cluster spark job requesting insert rpc requests, and
> multiple impalad are sending scan rpc requests to kudu.
> 
> 
> This is the process up to my decision that improve kudu's rpc
> throughput, improve scan throughput.
> 
> 
> And why do you think execution time between `in ('some')` and `in
> ('some1', 'some2')` are so big?
> It is better run queries twice `in ('some1')`, `in ('some2)` and not
> using `in ('some1', 'some2')`
> 
> 
> 
> 
> 
> 
> @ Todd Lipcon
> 
> Thanks for replying me Tood Lipcon!
> 
> So below is what i understood. Can plz check my understanding?
> 
> Several services registered in reactor. (Event callback)
> 
> RPC(Insert, Scan etc) -> Acceptor -> bind packet to reactor threads ->
> Signal service's callback -> Doing his job. (Insert, Scan)
> 
> And top command shows that most cpu usages are by maintenance threads.
> (8 threads)
> 
> 
> 
> 
> 2017-04-29 5:48 GMT+09:00 Todd Lipcon <todd@cloudera.com>:
>> To clarify one bit - the acceptor thread is the thread calling accept() on
>> the listening TCP socket. Once accepted, the RPC system uses libev
>> (event-based IO) to react to new packets on a "reactor thread". When a full
>> RPC request is received, it is distributed to the "service threads".
>> 
>> I'd also suggest running 'top -H -p $(pgrep kudu-tserver)' to see the thread
>> activity during the workload. You can see if one of the reactor threads is
>> hitting 100% CPU, for example, though I've never seen that to be a
>> bottleneck. David's pointers are probably good places to start
>> investigating.
>> 
>> -Todd
>> 
>>> On Fri, Apr 28, 2017 at 1:41 PM, David Alves <davidralves@gmail.com> wrote:
>>> 
>>> Hi
>>> 
>>>  The acceptor thread only distributes work, it's very unlikely that is a
>>> bottleneck. Same goes for the number of workers, since the number of threads
>>> pulling data is defined by impala.
>>>  What is "extremely" slow in this case?
>>> 
>>>  Some things to check:
>>>  It seems like this is scanning only 5 tablets? Are those all the tablets
>>> in per ts? Do tablets have roughly the same size?
>>>  Are you using encoding/compression?
>>>  How much data per tablet?
>>>  Have you ran "compute stats" on impala?
>>> 
>>> Best
>>> David
>>> 
>>> 
>>> 
>>>> On Fri, Apr 28, 2017 at 9:07 AM, 기준 <0ctopus13prime@gmail.com>
wrote:
>>>> 
>>>> Hi!
>>>> 
>>>> I'm using kudu 1.3, impala 2.7.
>>>> 
>>>> I'm investigating about extreamly slow scan read in impala's profiling.
>>>> 
>>>> So i digged source impala, kudu's source code.
>>>> 
>>>> And i concluded this as a connection throughput problem.
>>>> 
>>>> As i found out, impala use below steps to send scan request to kudu.
>>>> 
>>>> 1. RunScannerThread -> Create new scan threads
>>>> 2. ProcessScanToken -> Open
>>>> 3. KuduScanner:GetNext
>>>> 4. Send Scan RPC -> Send scan rpc continuously
>>>> 
>>>> So i checked kudu's rpc configurations.
>>>> 
>>>> --rpc_num_acceptors_per_address=1
>>>> --rpc_num_service_threads=20
>>>> --rpc_service_queue_length=50
>>>> 
>>>> 
>>>> Here are my questions.
>>>> 
>>>> 1. Does acceptor accept all rpc requests and toss those to proper
>>>> service?
>>>> So, Scan rpc -> Acceptor -> RpcService?
>>>> 
>>>> 2. If i want to increase input throughput then should i increase
>>>> '--rpc_num_service_threads' right?
>>>> 
>>>> 3. Why '--rpc_num_acceptors_per_address' has so small value compared
>>>> to --rpc_num_service_threads? Because I'm going to increase that value
>>>> too, do you think this is a bad idea? if so can you plz describe
>>>> reason?
>>>> 
>>>> Thanks for replying me!
>>>> 
>>>> Have a nice day~ :)
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera

Mime
View raw message