kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: RPC and Service difference?
Date Thu, 04 May 2017 17:58:27 GMT
On Sat, Apr 29, 2017 at 4:15 AM, 기준 <0ctopus13prime@gmail.com> wrote:

>
> I find few things during testing.
>
> 1. Impala is idle most time.
> 2. Query execution time in few node among 9 nodes are bottleneck.
>
> SQL -> SELECT d1, SUM(m1) FROM table WHERE d1 in ('some1', 'some2',
> 'some3')
>
> node1 -> 702ms
> node2 -> 4,777ms          *
> node3 -> 6,624ms          *
> node4 -> 731ms
> node5 -> 688ms
> node6 -> 910ms
> node7 -> 16,667ms        *
> node8 -> 17,960ms        *
> node9 -> 655ms
>
> As you can see some nodes are definitely bottleneck also bottleneck
> server is changing every time after issuing single query.
>

One thing I found with Impala is that its logging can be too noisy and
cause some stalls. You might have slightly better performance if you can
make sure the impala vlog level is set to 0, not 1.

I don't think it would account for 10x difference, though, so I think David
is on the right track looking for skew.


>
> 3. Execution time between `in ('some')` and `in ('some1', 'some2')`
> are quite difference.
> Single filtering have 3x performance compared to containing more than
> single filtering.
>

Which version of Impala are you using? Pushdown of 'IN' was added in Impala
2.8 iirc.

>
>
> @ Todd Lipcon
>
> Thanks for replying me Tood Lipcon!
>
> So below is what i understood. Can plz check my understanding?
>
> Several services registered in reactor. (Event callback)
>
> RPC(Insert, Scan etc) -> Acceptor -> bind packet to reactor threads ->
> Signal service's callback -> Doing his job. (Insert, Scan)
>

The acceptor is only responsible for accepting the initial TCP connection.
After a connection is established, it is kept alive, and the acceptor does
not process individual RPCs.


>
> And top command shows that most cpu usages are by maintenance threads.
> (8 threads)
>

OK, I don't think the RPC system is related to this issue then.


>
>
>
>
> 2017-04-29 5:48 GMT+09:00 Todd Lipcon <todd@cloudera.com>:
> > To clarify one bit - the acceptor thread is the thread calling accept()
> on
> > the listening TCP socket. Once accepted, the RPC system uses libev
> > (event-based IO) to react to new packets on a "reactor thread". When a
> full
> > RPC request is received, it is distributed to the "service threads".
> >
> > I'd also suggest running 'top -H -p $(pgrep kudu-tserver)' to see the
> thread
> > activity during the workload. You can see if one of the reactor threads
> is
> > hitting 100% CPU, for example, though I've never seen that to be a
> > bottleneck. David's pointers are probably good places to start
> > investigating.
> >
> > -Todd
> >
> > On Fri, Apr 28, 2017 at 1:41 PM, David Alves <davidralves@gmail.com>
> wrote:
> >>
> >> Hi
> >>
> >>   The acceptor thread only distributes work, it's very unlikely that is
> a
> >> bottleneck. Same goes for the number of workers, since the number of
> threads
> >> pulling data is defined by impala.
> >>   What is "extremely" slow in this case?
> >>
> >>   Some things to check:
> >>   It seems like this is scanning only 5 tablets? Are those all the
> tablets
> >> in per ts? Do tablets have roughly the same size?
> >>   Are you using encoding/compression?
> >>   How much data per tablet?
> >>   Have you ran "compute stats" on impala?
> >>
> >> Best
> >> David
> >>
> >>
> >>
> >> On Fri, Apr 28, 2017 at 9:07 AM, 기준 <0ctopus13prime@gmail.com> wrote:
> >>>
> >>> Hi!
> >>>
> >>> I'm using kudu 1.3, impala 2.7.
> >>>
> >>> I'm investigating about extreamly slow scan read in impala's profiling.
> >>>
> >>> So i digged source impala, kudu's source code.
> >>>
> >>> And i concluded this as a connection throughput problem.
> >>>
> >>> As i found out, impala use below steps to send scan request to kudu.
> >>>
> >>> 1. RunScannerThread -> Create new scan threads
> >>> 2. ProcessScanToken -> Open
> >>> 3. KuduScanner:GetNext
> >>> 4. Send Scan RPC -> Send scan rpc continuously
> >>>
> >>> So i checked kudu's rpc configurations.
> >>>
> >>> --rpc_num_acceptors_per_address=1
> >>> --rpc_num_service_threads=20
> >>> --rpc_service_queue_length=50
> >>>
> >>>
> >>> Here are my questions.
> >>>
> >>> 1. Does acceptor accept all rpc requests and toss those to proper
> >>> service?
> >>> So, Scan rpc -> Acceptor -> RpcService?
> >>>
> >>> 2. If i want to increase input throughput then should i increase
> >>> '--rpc_num_service_threads' right?
> >>>
> >>> 3. Why '--rpc_num_acceptors_per_address' has so small value compared
> >>> to --rpc_num_service_threads? Because I'm going to increase that value
> >>> too, do you think this is a bad idea? if so can you plz describe
> >>> reason?
> >>>
> >>> Thanks for replying me!
> >>>
> >>> Have a nice day~ :)
> >>
> >>
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message