kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <...@cloudera.com>
Subject Re: Partition and Split rows
Date Thu, 12 May 2016 20:45:25 GMT
On Thu, May 12, 2016 at 11:39 AM, Sand Stone <sand.m.stone@gmail.com> wrote:

I don't know how Kudu load balance the data across the tablet servers.
>

Individual tablets are replicated and balanced across all available tablet
servers, for more on that see
http://getkudu.io/docs/schema_design.html#data-distribution.



> For example, do I need to pre-calculate every day, a list of 5 minutes
> apart timestamps at table creation? [assume I have to create a new table
> every day].
>

If you wish to range partition on the time column, then yes, currently you
must specify the splits upfront during table creation (but this will change
with the non-covering range partitions work).


>
> My hope, with the additional 5-min column, and use it as the range
> partition column, is that so I could spread the data evenly across the
> tablet servers.
>

I don't think this is meaningfully different than range partitioning on the
full time column with splits every 5 minutes.


> Also, since 5-min interval data are always colocated together, the read
> query could be efficient too.
>

Data colocation is a function of the partitioning and indexing.  As I
mentioned before, if you have timestamp as part of your primary key then
you can guarantee that scans specifying a time range are efficient. Overall
it sounds like you are attempting to get fast scans by creating many fine
grained partitions, as you might with Parquet.  This won't be an efficient
strategy in Kudu, since each tablet server should only have on the order of
10-20 tablets.  Instead, take advantage of the index capability of Primary
Keys.

- Dan


> On Thu, May 12, 2016 at 11:13 AM, Dan Burkert <dan@cloudera.com> wrote:
>
>> Forgot to add the PK specification to the CREATE TABLE, it should have
>> read as follows:
>>
>> CREATE TABLE metrics (metric STRING, time TIMESTAMP, value DOUBLE)
>> PRIMARY KEY (metric, time);
>>
>> - Dan
>>
>>
>> On Thu, May 12, 2016 at 11:12 AM, Dan Burkert <dan@cloudera.com> wrote:
>>
>>>
>>> On Thu, May 12, 2016 at 11:05 AM, Sand Stone <sand.m.stone@gmail.com>
>>> wrote:
>>>
>>>> > Is the requirement to pre-aggregate by time window?
>>>> No, I am thinking to create a column say, "minute". It's basically the
>>>> minute field of the timestamp column(even round to 5-min bucket depending
>>>> on the needs). So it's a computed column being filled in on data ingestion.
>>>> My goal is that this field would help with data filtering at read/query
>>>> time, say select certain projection at minute 10-15, to speed up the read
>>>> queries.
>>>>
>>>
>>> In many cases, Kudu can do his for you without having to add special
>>> columns.  The requirements are that the timestamp is part of the primary
>>> key, and any columns that come before the timestamp in the primary key (if
>>> it's a compound PK), have equality predicates.  So for instance, if you
>>> create a table such as:
>>>
>>> CREATE TABLE metrics (metric STRING, time TIMESTAMP, value DOUBLE);
>>>
>>> then queries such as
>>>
>>> SELECT time, value FROM metrics WHERE metric = "my-metric" AND time >
>>> 2016-05-01T00:00 AND time < 2016-05-01T00:05
>>>
>>> Then only the data for that 5 minute time window will be read from
>>> disk.  If the query didn't have the equality predicate on the 'metric'
>>> column, then it would do a much bigger scan + filter operation.  If you
>>> want more background on how this is achieved, check out the partition
>>> pruning design doc:
>>> https://github.com/apache/incubator-kudu/blob/master/docs/design-docs/scan-optimization-partition-pruning.md
>>> .
>>>
>>> - Dan
>>>
>>>
>>>
>>>> Thanks for the info., I will follow them.
>>>>
>>>> On Thu, May 12, 2016 at 10:50 AM, Dan Burkert <dan@cloudera.com> wrote:
>>>>
>>>>> Hey Sand,
>>>>>
>>>>> Sorry for the delayed response.  I'm not quite following your use
>>>>> case.  Is the requirement to pre-aggregate by time window? I don't think
>>>>> Kudu can help you directly with that (nothing built in), but you could
>>>>> always create a separate table to store the pre-aggregated values.  As
far
>>>>> as applying functions to do row splits, that is an interesting idea,
but I
>>>>> think once Kudu has support for range bounds (the non-covering range
>>>>> partition design doc linked above), you can simply create the bounds
where
>>>>> the function would have put them.  For example, if you want a partition
for
>>>>> every five minutes, you can create the bounds accordingly.
>>>>>
>>>>> Earlier this week I gave a talk on timeseries in Kudu, I've included
>>>>> some slides that may be interesting to you.  Additionally, you may want
to
>>>>> check out https://github.com/danburkert/kudu-ts, it's a very young
>>>>>  (not feature complete) metrics layer on top of Kudu, it may give you
some
>>>>> ideas.
>>>>>
>>>>> - Dan
>>>>>
>>>>> On Sat, May 7, 2016 at 1:28 PM, Sand Stone <sand.m.stone@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for sharing, Dan. The diagrams explained clearly how the
>>>>>> current system works.
>>>>>>
>>>>>> As for things in my mind. Take the schema of <host,metric,time,...>,
>>>>>> say, I am interested in data for the past 5 mins, 10 mins, etc. Or,
>>>>>> aggregate at 5 mins interval for the past 3 days, 7 days, ... Looks
like I
>>>>>> need to introduce a special 5-min bar column, use that column to
do range
>>>>>> partition to spread data across the tablet servers so that I could
leverage
>>>>>> parallel filtering.
>>>>>>
>>>>>> The cost of this extra column (INT8) is not ideal but not too bad
>>>>>> either (storage cost wise, compression should do wonders). So I am
thinking
>>>>>> whether it would be better to take "functions" as row split instead
of only
>>>>>> constants. Of course if business requires to drop down to 1-min bar,
the
>>>>>> data has to be re-sharded again. So a more cost effective way of
doing this
>>>>>> on a production cluster would be good.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sat, May 7, 2016 at 8:50 AM, Dan Burkert <dan@cloudera.com>
wrote:
>>>>>>
>>>>>>> Hi Sand,
>>>>>>>
>>>>>>> I've been working on some diagrams to help explain some of the
more
>>>>>>> advanced partitioning types, it's attached.   Still pretty rough
at this
>>>>>>> point, but the goal is to clean it up and move it into the Kudu
>>>>>>> documentation proper.  I'm interested to hear what kind of time
series you
>>>>>>> are interested in Kudu for.  I'm tasked with improving Kudu for
time
>>>>>>> series, you can follow progress here
>>>>>>> <https://issues.apache.org/jira/browse/KUDU-1306>. If you
have any
>>>>>>> additional ideas I'd love to hear them.  You may also be interested
in a
>>>>>>> small project that a JD and I have been working on in the past
week to
>>>>>>> build an OpenTSDB style store on top of Kudu, you can find it
here
>>>>>>> <https://github.com/danburkert/kudu-ts>.  Still quite feature
>>>>>>> limited at this point.
>>>>>>>
>>>>>>> - Dan
>>>>>>>
>>>>>>> On Fri, May 6, 2016 at 4:51 PM, Sand Stone <sand.m.stone@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks. Will read.
>>>>>>>>
>>>>>>>> Given that I am researching time series data, row locality
is
>>>>>>>> crucial :-)
>>>>>>>>
>>>>>>>> On Fri, May 6, 2016 at 3:57 PM, Jean-Daniel Cryans <
>>>>>>>> jdcryans@apache.org> wrote:
>>>>>>>>
>>>>>>>>> We do have non-covering range partitions coming in the
next few
>>>>>>>>> months, here's the design (in review):
>>>>>>>>> http://gerrit.cloudera.org:8080/#/c/2772/9/docs/design-docs/non-covering-range-partitions.md
>>>>>>>>>
>>>>>>>>> The "Background & Motivation" section should give
you a good idea
>>>>>>>>> of why I'm mentioning this.
>>>>>>>>>
>>>>>>>>> Meanwhile, if you don't need row locality, using hash
partitioning
>>>>>>>>> could be good enough.
>>>>>>>>>
>>>>>>>>> J-D
>>>>>>>>>
>>>>>>>>> On Fri, May 6, 2016 at 3:53 PM, Sand Stone <sand.m.stone@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Makes sense.
>>>>>>>>>>
>>>>>>>>>> Yeah it would be cool if users could specify/control
the split
>>>>>>>>>> rows after the table is created. Now, I have to "think
ahead" to pre-create
>>>>>>>>>> the range buckets.
>>>>>>>>>>
>>>>>>>>>> On Fri, May 6, 2016 at 3:49 PM, Jean-Daniel Cryans
<
>>>>>>>>>> jdcryans@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> You will only get 1 tablet and no data distribution,
which is
>>>>>>>>>>> bad.
>>>>>>>>>>>
>>>>>>>>>>> That's also how HBase works, but it will split
regions as you
>>>>>>>>>>> insert data and eventually you'll get some data
distribution even if it
>>>>>>>>>>> doesn't start in an ideal situation. Tablet splitting
will come later for
>>>>>>>>>>> Kudu.
>>>>>>>>>>>
>>>>>>>>>>> J-D
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 6, 2016 at 3:42 PM, Sand Stone <
>>>>>>>>>>> sand.m.stone@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> One more questions, how does the range partition
work if I
>>>>>>>>>>>> don't specify the split rows?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 6, 2016 at 3:37 PM, Sand Stone
<
>>>>>>>>>>>> sand.m.stone@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks, Misty. The "advanced" impala
example helped.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I was just reading the Java API,CreateTableOptions.java,
it's
>>>>>>>>>>>>> unclear how the range partition column
names associated with the partial
>>>>>>>>>>>>> rows params in the addSplitRow API.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 6, 2016 at 3:08 PM, Misty
Stanley-Jones <
>>>>>>>>>>>>> mstanleyjones@cloudera.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Sand,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Please have a look at
>>>>>>>>>>>>>> http://getkudu.io/docs/kudu_impala_integration.html#partitioning_tables
>>>>>>>>>>>>>> and see if it is helpful to you.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Misty
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, May 6, 2016 at 2:00 PM, Sand
Stone <
>>>>>>>>>>>>>> sand.m.stone@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi, I am new to Kudu. I wonder
how the split rows work. I
>>>>>>>>>>>>>>> know from some docs, this is
currently for pre-creation the table. I am
>>>>>>>>>>>>>>> researching how to partition
(hash+range) some time series test data.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there an example? or notes
somewhere I could read upon.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks much.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message