kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <...@cloudera.com>
Subject Re: Partition and Split rows
Date Thu, 12 May 2016 21:17:08 GMT
On Thu, May 12, 2016 at 2:04 PM, Sand Stone <sand.m.stone@gmail.com> wrote:

>Instead, take advantage of the index capability of Primary Keys.
> Currently I did make the "5-min" field a part of the primary key as well.
> I am most likely overdoing it. I will play around with the schema and use
> cases around it.
>

Definitely take a look at the data model in Kudu TS, it has extremely
efficient scan semantics (all scans retrieve only the necessary data by
using the primary key, no client *or* server side filtering), and it works
with arbitrarily large time range partitions.


> >since each tablet server should only have on the order of 10-20 tablets.
> How does this 10-20 heuristics come out? Is it based on certain machine
> profile? Or some default parameters in the code/config?
>

10-20 isn't a hard and fast rule, and it is very dependent on the dataset
and the machine size.  We routinely run tablet servers with 300+ tablets,
so it's definitely not a hard limitation.  My intention was to stress that
instead of pursuing fine grained partitions as a method of limiting the
size of scans, instead take advantage of the primary key indexing that Kudu
provides, just as you might in a traditional relational database.
Partitions are better suited for ensuring that inserts and large scans can
be parallelized across multiple machines.

- Dan


>
> On Thu, May 12, 2016 at 1:45 PM, Dan Burkert <dan@cloudera.com> wrote:
>
>>
>> On Thu, May 12, 2016 at 11:39 AM, Sand Stone <sand.m.stone@gmail.com>
>> wrote:
>>
>> I don't know how Kudu load balance the data across the tablet servers.
>>>
>>
>> Individual tablets are replicated and balanced across all available
>> tablet servers, for more on that see
>> http://getkudu.io/docs/schema_design.html#data-distribution.
>>
>>
>>
>>> For example, do I need to pre-calculate every day, a list of 5 minutes
>>> apart timestamps at table creation? [assume I have to create a new table
>>> every day].
>>>
>>
>> If you wish to range partition on the time column, then yes, currently
>> you must specify the splits upfront during table creation (but this will
>> change with the non-covering range partitions work).
>>
>>
>>>
>>> My hope, with the additional 5-min column, and use it as the range
>>> partition column, is that so I could spread the data evenly across the
>>> tablet servers.
>>>
>>
>> I don't think this is meaningfully different than range partitioning on
>> the full time column with splits every 5 minutes.
>>
>>
>>> Also, since 5-min interval data are always colocated together, the read
>>> query could be efficient too.
>>>
>>
>> Data colocation is a function of the partitioning and indexing.  As I
>> mentioned before, if you have timestamp as part of your primary key then
>> you can guarantee that scans specifying a time range are efficient. Overall
>> it sounds like you are attempting to get fast scans by creating many fine
>> grained partitions, as you might with Parquet.  This won't be an efficient
>> strategy in Kudu, since each tablet server should only have on the order of
>> 10-20 tablets.  Instead, take advantage of the index capability of Primary
>> Keys.
>>
>> - Dan
>>
>>
>>> On Thu, May 12, 2016 at 11:13 AM, Dan Burkert <dan@cloudera.com> wrote:
>>>
>>>> Forgot to add the PK specification to the CREATE TABLE, it should have
>>>> read as follows:
>>>>
>>>> CREATE TABLE metrics (metric STRING, time TIMESTAMP, value DOUBLE)
>>>> PRIMARY KEY (metric, time);
>>>>
>>>> - Dan
>>>>
>>>>
>>>> On Thu, May 12, 2016 at 11:12 AM, Dan Burkert <dan@cloudera.com> wrote:
>>>>
>>>>>
>>>>> On Thu, May 12, 2016 at 11:05 AM, Sand Stone <sand.m.stone@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> > Is the requirement to pre-aggregate by time window?
>>>>>> No, I am thinking to create a column say, "minute". It's basically
>>>>>> the minute field of the timestamp column(even round to 5-min bucket
>>>>>> depending on the needs). So it's a computed column being filled in
on data
>>>>>> ingestion. My goal is that this field would help with data filtering
at
>>>>>> read/query time, say select certain projection at minute 10-15, to
speed up
>>>>>> the read queries.
>>>>>>
>>>>>
>>>>> In many cases, Kudu can do his for you without having to add special
>>>>> columns.  The requirements are that the timestamp is part of the primary
>>>>> key, and any columns that come before the timestamp in the primary key
(if
>>>>> it's a compound PK), have equality predicates.  So for instance, if you
>>>>> create a table such as:
>>>>>
>>>>> CREATE TABLE metrics (metric STRING, time TIMESTAMP, value DOUBLE);
>>>>>
>>>>> then queries such as
>>>>>
>>>>> SELECT time, value FROM metrics WHERE metric = "my-metric" AND time >
>>>>> 2016-05-01T00:00 AND time < 2016-05-01T00:05
>>>>>
>>>>> Then only the data for that 5 minute time window will be read from
>>>>> disk.  If the query didn't have the equality predicate on the 'metric'
>>>>> column, then it would do a much bigger scan + filter operation.  If you
>>>>> want more background on how this is achieved, check out the partition
>>>>> pruning design doc:
>>>>> https://github.com/apache/incubator-kudu/blob/master/docs/design-docs/scan-optimization-partition-pruning.md
>>>>> .
>>>>>
>>>>> - Dan
>>>>>
>>>>>
>>>>>
>>>>>> Thanks for the info., I will follow them.
>>>>>>
>>>>>> On Thu, May 12, 2016 at 10:50 AM, Dan Burkert <dan@cloudera.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hey Sand,
>>>>>>>
>>>>>>> Sorry for the delayed response.  I'm not quite following your
use
>>>>>>> case.  Is the requirement to pre-aggregate by time window? I
don't think
>>>>>>> Kudu can help you directly with that (nothing built in), but
you could
>>>>>>> always create a separate table to store the pre-aggregated values.
 As far
>>>>>>> as applying functions to do row splits, that is an interesting
idea, but I
>>>>>>> think once Kudu has support for range bounds (the non-covering
range
>>>>>>> partition design doc linked above), you can simply create the
bounds where
>>>>>>> the function would have put them.  For example, if you want a
partition for
>>>>>>> every five minutes, you can create the bounds accordingly.
>>>>>>>
>>>>>>> Earlier this week I gave a talk on timeseries in Kudu, I've included
>>>>>>> some slides that may be interesting to you.  Additionally, you
may want to
>>>>>>> check out https://github.com/danburkert/kudu-ts, it's a very
young
>>>>>>>  (not feature complete) metrics layer on top of Kudu, it may
give you some
>>>>>>> ideas.
>>>>>>>
>>>>>>> - Dan
>>>>>>>
>>>>>>> On Sat, May 7, 2016 at 1:28 PM, Sand Stone <sand.m.stone@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for sharing, Dan. The diagrams explained clearly how
the
>>>>>>>> current system works.
>>>>>>>>
>>>>>>>> As for things in my mind. Take the schema of
>>>>>>>> <host,metric,time,...>, say, I am interested in data
for the past 5 mins,
>>>>>>>> 10 mins, etc. Or, aggregate at 5 mins interval for the past
3 days, 7 days,
>>>>>>>> ... Looks like I need to introduce a special 5-min bar column,
use that
>>>>>>>> column to do range partition to spread data across the tablet
servers so
>>>>>>>> that I could leverage parallel filtering.
>>>>>>>>
>>>>>>>> The cost of this extra column (INT8) is not ideal but not
too bad
>>>>>>>> either (storage cost wise, compression should do wonders).
So I am thinking
>>>>>>>> whether it would be better to take "functions" as row split
instead of only
>>>>>>>> constants. Of course if business requires to drop down to
1-min bar, the
>>>>>>>> data has to be re-sharded again. So a more cost effective
way of doing this
>>>>>>>> on a production cluster would be good.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Sat, May 7, 2016 at 8:50 AM, Dan Burkert <dan@cloudera.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Sand,
>>>>>>>>>
>>>>>>>>> I've been working on some diagrams to help explain some
of the
>>>>>>>>> more advanced partitioning types, it's attached.   Still
pretty rough at
>>>>>>>>> this point, but the goal is to clean it up and move it
into the Kudu
>>>>>>>>> documentation proper.  I'm interested to hear what kind
of time series you
>>>>>>>>> are interested in Kudu for.  I'm tasked with improving
Kudu for time
>>>>>>>>> series, you can follow progress here
>>>>>>>>> <https://issues.apache.org/jira/browse/KUDU-1306>.
If you have
>>>>>>>>> any additional ideas I'd love to hear them.  You may
also be interested in
>>>>>>>>> a small project that a JD and I have been working on
in the past week to
>>>>>>>>> build an OpenTSDB style store on top of Kudu, you can
find it here
>>>>>>>>> <https://github.com/danburkert/kudu-ts>.  Still
quite feature
>>>>>>>>> limited at this point.
>>>>>>>>>
>>>>>>>>> - Dan
>>>>>>>>>
>>>>>>>>> On Fri, May 6, 2016 at 4:51 PM, Sand Stone <sand.m.stone@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> Thanks. Will read.
>>>>>>>>>>
>>>>>>>>>> Given that I am researching time series data, row
locality is
>>>>>>>>>> crucial :-)
>>>>>>>>>>
>>>>>>>>>> On Fri, May 6, 2016 at 3:57 PM, Jean-Daniel Cryans
<
>>>>>>>>>> jdcryans@apache.org> wrote:
>>>>>>>>>>
>>>>>>>>>>> We do have non-covering range partitions coming
in the next few
>>>>>>>>>>> months, here's the design (in review):
>>>>>>>>>>> http://gerrit.cloudera.org:8080/#/c/2772/9/docs/design-docs/non-covering-range-partitions.md
>>>>>>>>>>>
>>>>>>>>>>> The "Background & Motivation" section should
give you a good
>>>>>>>>>>> idea of why I'm mentioning this.
>>>>>>>>>>>
>>>>>>>>>>> Meanwhile, if you don't need row locality, using
hash
>>>>>>>>>>> partitioning could be good enough.
>>>>>>>>>>>
>>>>>>>>>>> J-D
>>>>>>>>>>>
>>>>>>>>>>> On Fri, May 6, 2016 at 3:53 PM, Sand Stone <
>>>>>>>>>>> sand.m.stone@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Makes sense.
>>>>>>>>>>>>
>>>>>>>>>>>> Yeah it would be cool if users could specify/control
the split
>>>>>>>>>>>> rows after the table is created. Now, I have
to "think ahead" to pre-create
>>>>>>>>>>>> the range buckets.
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, May 6, 2016 at 3:49 PM, Jean-Daniel
Cryans <
>>>>>>>>>>>> jdcryans@apache.org> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> You will only get 1 tablet and no data
distribution, which is
>>>>>>>>>>>>> bad.
>>>>>>>>>>>>>
>>>>>>>>>>>>> That's also how HBase works, but it will
split regions as you
>>>>>>>>>>>>> insert data and eventually you'll get
some data distribution even if it
>>>>>>>>>>>>> doesn't start in an ideal situation.
Tablet splitting will come later for
>>>>>>>>>>>>> Kudu.
>>>>>>>>>>>>>
>>>>>>>>>>>>> J-D
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, May 6, 2016 at 3:42 PM, Sand
Stone <
>>>>>>>>>>>>> sand.m.stone@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> One more questions, how does the
range partition work if I
>>>>>>>>>>>>>> don't specify the split rows?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, May 6, 2016 at 3:37 PM, Sand
Stone <
>>>>>>>>>>>>>> sand.m.stone@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks, Misty. The "advanced"
impala example helped.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I was just reading the Java API,CreateTableOptions.java,
>>>>>>>>>>>>>>> it's unclear how the range partition
column names associated with the
>>>>>>>>>>>>>>> partial rows params in the addSplitRow
API.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, May 6, 2016 at 3:08 PM,
Misty Stanley-Jones <
>>>>>>>>>>>>>>> mstanleyjones@cloudera.com>
wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Sand,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please have a look at
>>>>>>>>>>>>>>>> http://getkudu.io/docs/kudu_impala_integration.html#partitioning_tables
>>>>>>>>>>>>>>>> and see if it is helpful
to you.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Misty
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, May 6, 2016 at 2:00
PM, Sand Stone <
>>>>>>>>>>>>>>>> sand.m.stone@gmail.com>
wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi, I am new to Kudu.
I wonder how the split rows work. I
>>>>>>>>>>>>>>>>> know from some docs,
this is currently for pre-creation the table. I am
>>>>>>>>>>>>>>>>> researching how to partition
(hash+range) some time series test data.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is there an example?
or notes somewhere I could read upon.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks much.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message