kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <...@cloudera.com>
Subject Re: Partition and Split rows
Date Mon, 16 May 2016 20:32:32 GMT
Hi Amit, responses inline


> Q: Can I fetch the kudu Timestamp data from Tableu/Pentaho for reporting
> and analytics purpose or I need Int64 datatype only.
>

Is Tableu/Pentaho using Impala to query?  If so, see the answers below.


> Q: Are you going to provide the Kimpala merge or kudu timestamp support in
> Impala/Tableu in near future.
>

TIMESTAMP typed columns should work in Impala/Kudu as of the 0.8 release
(the latest), however I know there were a few bugs in the previous
releases.  If your using the latest and are still seeing errors, please
send the error and/or file an issue on JIRA.


> Q: At this moment, instead of Timestamp we are using Int64 type in kudu,
> will it be equally helpful for performance, if we use the partition and
> split by explanation given for timestamp datatype? e.g. can we get a better
> performance for a composite key on a given kudu table defined as
> 'metric(s),Int64(has timestamp data) and with a partition on Int64 column
> with Split by clause?
>

Internally, the TIMESTAMP type is just an alias to INT64, so it has the
exact same performance characteristics.  The only difference is how the
values are displayed in log messages and on the web UI.

- Dan


> On Sat, May 7, 2016 at 9:20 PM, Dan Burkert <dan@cloudera.com> wrote:
>
>> Hi Sand,
>>
>> I've been working on some diagrams to help explain some of the more
>> advanced partitioning types, it's attached.   Still pretty rough at this
>> point, but the goal is to clean it up and move it into the Kudu
>> documentation proper.  I'm interested to hear what kind of time series you
>> are interested in Kudu for.  I'm tasked with improving Kudu for time
>> series, you can follow progress here
>> <https://issues.apache.org/jira/browse/KUDU-1306>. If you have any
>> additional ideas I'd love to hear them.  You may also be interested in a
>> small project that a JD and I have been working on in the past week to
>> build an OpenTSDB style store on top of Kudu, you can find it here
>> <https://github.com/danburkert/kudu-ts>.  Still quite feature limited at
>> this point.
>>
>> - Dan
>>
>> On Fri, May 6, 2016 at 4:51 PM, Sand Stone <sand.m.stone@gmail.com>
>> wrote:
>>
>>> Thanks. Will read.
>>>
>>> Given that I am researching time series data, row locality is crucial
>>> :-)
>>>
>>> On Fri, May 6, 2016 at 3:57 PM, Jean-Daniel Cryans <jdcryans@apache.org>
>>> wrote:
>>>
>>>> We do have non-covering range partitions coming in the next few months,
>>>> here's the design (in review):
>>>> http://gerrit.cloudera.org:8080/#/c/2772/9/docs/design-docs/non-covering-range-partitions.md
>>>>
>>>> The "Background & Motivation" section should give you a good idea of
>>>> why I'm mentioning this.
>>>>
>>>> Meanwhile, if you don't need row locality, using hash partitioning
>>>> could be good enough.
>>>>
>>>> J-D
>>>>
>>>> On Fri, May 6, 2016 at 3:53 PM, Sand Stone <sand.m.stone@gmail.com>
>>>> wrote:
>>>>
>>>>> Makes sense.
>>>>>
>>>>> Yeah it would be cool if users could specify/control the split rows
>>>>> after the table is created. Now, I have to "think ahead" to pre-create
the
>>>>> range buckets.
>>>>>
>>>>> On Fri, May 6, 2016 at 3:49 PM, Jean-Daniel Cryans <
>>>>> jdcryans@apache.org> wrote:
>>>>>
>>>>>> You will only get 1 tablet and no data distribution, which is bad.
>>>>>>
>>>>>> That's also how HBase works, but it will split regions as you insert
>>>>>> data and eventually you'll get some data distribution even if it
doesn't
>>>>>> start in an ideal situation. Tablet splitting will come later for
Kudu.
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Fri, May 6, 2016 at 3:42 PM, Sand Stone <sand.m.stone@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> One more questions, how does the range partition work if I don't
>>>>>>> specify the split rows?
>>>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> On Fri, May 6, 2016 at 3:37 PM, Sand Stone <sand.m.stone@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks, Misty. The "advanced" impala example helped.
>>>>>>>>
>>>>>>>> I was just reading the Java API,CreateTableOptions.java,
it's
>>>>>>>> unclear how the range partition column names associated with
the partial
>>>>>>>> rows params in the addSplitRow API.
>>>>>>>>
>>>>>>>> On Fri, May 6, 2016 at 3:08 PM, Misty Stanley-Jones <
>>>>>>>> mstanleyjones@cloudera.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Sand,
>>>>>>>>>
>>>>>>>>> Please have a look at
>>>>>>>>> http://getkudu.io/docs/kudu_impala_integration.html#partitioning_tables
>>>>>>>>> and see if it is helpful to you.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Misty
>>>>>>>>>
>>>>>>>>> On Fri, May 6, 2016 at 2:00 PM, Sand Stone <sand.m.stone@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Hi, I am new to Kudu. I wonder how the split rows
work. I know
>>>>>>>>>> from some docs, this is currently for pre-creation
the table. I am
>>>>>>>>>> researching how to partition (hash+range) some time
series test data.
>>>>>>>>>>
>>>>>>>>>> Is there an example? or notes somewhere I could read
upon.
>>>>>>>>>>
>>>>>>>>>> Thanks much.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
>
> --
> Thanks & Regards,
>
> *Amit Adhau* | Data Architect
>
> *GLOBANT* | IND:+91 9821518132
>
> [image: Facebook] <https://www.facebook.com/Globant>
>
> [image: Twitter] <http://www.twitter.com/globant>
>
> [image: Youtube] <http://www.youtube.com/Globant>
>
> [image: Linkedin] <http://www.linkedin.com/company/globant>
>
> [image: Pinterest] <http://pinterest.com/globant/>
>
> [image: Globant] <http://www.globant.com/>
>
> The information contained in this e-mail may be confidential. It has been
> sent for the sole use of the intended recipient(s). If the reader of this
> message is not an intended recipient, you are hereby notified that any
> unauthorized review, use, disclosure, dissemination, distribution or
> copying of this communication, or any of its contents,
> is strictly prohibited. If you have received it by mistake please let us
> know by e-mail immediately and delete it from your system. Many thanks.
>
>
>
> La información contenida en este mensaje puede ser confidencial. Ha sido
> enviada para el uso exclusivo del destinatario(s) previsto. Si el lector de
> este mensaje no fuera el destinatario previsto, por el presente queda Ud.
> notificado que cualquier lectura, uso, publicación, diseminación,
> distribución o copiado de esta comunicación o su contenido está
> estrictamente prohibido. En caso de que Ud. hubiera recibido este mensaje
> por error le agradeceremos notificarnos por e-mail inmediatamente y
> eliminarlo de su sistema. Muchas gracias.
>
>

Mime
View raw message