kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Franco VENTURI <fvent...@comcast.net>
Subject Re: Kudu table api
Date Sat, 23 Mar 2019 22:46:07 GMT
An approach to speed up counting the number of rows using the Kudu client API could first retrieve
the list of KuduScanTokens for the scan below and then use several worker threads in parallel
to add up the number of rows.


I implemented it in Java this afternoon and it took about 13s to go through a 10.5G row table
on our Kudu cluster using 8 worker threads.


The Java code is attached - it is not the most elegant code and definitely doesn't follow
the canonical "design patterns", but I just wanted to give you a better idea of what I mean.


Franco


> On March 22, 2019 at 3:11 PM Boris Tyukin <boris@boristyukin.com> wrote:
> 
>     Hi Dmitry, check Java Kudu API examples if you have not done it yet
>     https://github.com/apache/kudu/tree/master/examples
> 
>     I remember it had a helper class that counts rows. Like Adar said, I do not think
there is a better / faster way - you just create a Kudu scanner, get rows back and iterate
over returned sets and increase the counter. 
> 
>     It was much faster than Impala count in my benchmarks - Kudu API is extremely fast
and easy to use.
> 
>     Boris
> 
>     On Fri, Mar 22, 2019 at 2:22 PM Adar Lieber-Dembo < adar@cloudera.com mailto:adar@cloudera.com
> wrote:
> 
>         > > Probably a scan with no predicates and a minimal projection. Then you
> >         can iterate over the results and increment a count of rows.
> > 
> >         Or, if you're using Impala, "SELECT COUNT(*) FROM FOO".
> > 
> >         On Fri, Mar 22, 2019 at 3:23 AM Дмитрий Павлов < dm.pavlov@inbox.ru
mailto:dm.pavlov@inbox.ru > wrote:
> >         >
> >         > Hi guys
> >         >
> >         > What is the quickest way to get total number of rows in the table using
kudu client?
> >         >
> >         > Regards, Dmitry Pavlov
> > 
> >     > 

Mime
View raw message