kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mauricio Aristizabal <mauri...@impactradius.com>
Subject Re: new Kudu benchmarks
Date Fri, 05 Jan 2018 22:50:59 GMT
Todd, since you bring it up in this thread... what CDH version do you
expect DECIMAL support to make it into? I recently asked Icaro Vazquez
about it but still no news.  We're hoping it makes it into 5.14 otherwise
according to the roadmap there might not be another minor release and we'd
be waiting till Summer for CDH 6.

And just in case we're forced to make do without DECIMAL initially, is the
recommendation really to store as string and convert?  I was thinking of
storing as int/long and dividing by 10 or 1000 as needed in an impala view
over the kudu table.  Wouldn't a division be way more performant than a
conversion from string, especially when aggregating over thousands of
records in a report query?

thanks,

-Mauricio


On Fri, Jan 5, 2018 at 11:13 AM, Todd Lipcon <todd@cloudera.com> wrote:

> Oh, one other piece of feedback: maybe worth editing the title to say "vs
> Apache Parquet" instead of "vs Apache Impala" since in all cases you are
> using Impala as the query engine?
>
> -Todd
>
> On Fri, Jan 5, 2018 at 11:06 AM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> Hey Boris,
>>
>> Thanks for publishing this. It's a great look at how an end user
>> evaluates Kudu. I appreciate that you cover both the pros and cons of the
>> technology, and glad to see that your conclusion leaves you excited about
>> Kudu :)
>>
>> One quick note is that I think you'll be even more pleased when you
>> upgrade to a later version (eg Kudu 1.5). We've improved performance in
>> several areas and also improved scalability compared to the version you're
>> testing. TIMESTAMP is also supported now, with DECIMAL soon to follow. It
>> might be worth noting this as an addendum to the blog post if you feel like
>> it.
>>
>> -Todd
>>
>> On Fri, Jan 5, 2018 at 10:51 AM, Boris Tyukin <boris@boristyukin.com>
>> wrote:
>>
>>> Hi guys,
>>>
>>> we just finished testing Kudu, mostly comparing Kudu to Impala on
>>> HDFS/parquet. I wanted to share my blog post and results. We used typical
>>> (and real) healthcare data for the test, not a synthetic data which I think
>>> makes it is a bit more interesting.
>>>
>>> I welcome any feedback!
>>>
>>> http://boristyukin.com/benchmarking-apache-kudu-vs-apache-impala/
>>>
>>> We are really impressed with Kudu and I wanted to take an opportunity to
>>> thank Kudu developers for such an amazing and much-needed product.
>>>
>>> Boris
>>>
>>>
>>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>



-- 
*MAURICIO ARISTIZABAL*
Architect - Business Intelligence + Data Science
mauricio@impactradius.com(m)+1 323 309 4260
223 E. De La Guerra St. | Santa Barbara, CA 93101

Overview <http://www.impactradius.com/?src=slsap> | Twitter
<https://twitter.com/impactradius> | Facebook
<https://www.facebook.com/pages/Impact-Radius/153376411365183> | LinkedIn
<https://www.linkedin.com/company/impact-radius-inc->

Mime
View raw message