kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Birdsell <jordan.birdsell.k...@statefarm.com>
Subject RE: Proposal: remove default partitioning for new tables
Date Fri, 20 May 2016 13:41:41 GMT
+1 ...this is a great recommendation

-----Original Message-----
From: Sand Stone [mailto:sand.m.stone@gmail.com] 
Sent: Thursday, May 19, 2016 10:39 PM
To: user@kudu.incubator.apache.org
Cc: dev@kudu.incubator.apache.org
Subject: Re: Proposal: remove default partitioning for new tables

Agreed that this is a sensible API change.

On Thu, May 19, 2016 at 4:07 PM, Abhi Basu <9000revs@gmail.com> wrote:

> I think this a very reasonable feature request. I have recently started
> working with Kudu and the "default" behavior has already tripped me up a
> couple times.
>
> Thanks,
>
> Abhi
>
> On Thu, May 19, 2016 at 4:03 PM, Dan Burkert <danburkert@apache.org>
> wrote:
>
>> Hi all,
>>
>> One of the issues that trips up new Kudu users is the uncertainty about
>> how partitioning works, and how to use partitioning effectively.  Much of
>> this can be addressed with better documentation and explanatory materials,
>> and that should be an area of focus leading up to our 1.0 release. However,
>> the default partitioning behavior is suboptimal, and changing the default
>> could lead to significantly less user confusion and frustration. Currently,
>> when creating a new table, Kudu defaults to using only a single tablet,
>> which is a known anti-pattern.  This can be painful for users who create a
>> table assuming Kudu will have good defaults, and begin loading data only to
>> find out later that they will need to recreate the table with partitioning
>> to achieve good results.
>>
>> A better default partitioning strategy might be hash partitioning over
>> the primary key columns, with a number of hash buckets based on the number
>> of tablet servers (perhaps something like 3x the number of tablet
>> servers).  This would alleviate the worst scalability issues with the
>> current default, however it has a few downsides of its own. Hash
>> partitioning is not appropriate for every use case, and any rule-of-thumb
>> number of tablets we could come up with will not always be optimal.
>>
>> Given that there is no bullet-proof default, and that changing
>> partitioning strategy after table creation is impossible, and changing the
>> default partitioning strategy is a backwards incompatible change, I propose
>> we remove the default altogether.  Users would be required to explicitly
>> specify the table partitioning during creation, and failing to do so would
>> result in an illegal argument error.  Users who really do want only a
>> single tablet will still be able to do so by explicitly configuring range
>> partitioning with no split rows.
>>
>> I'd like to get community feedback on whether this seems like a good
>> direction to take.  I have put together a patch, you can check out the
>> changes to test files to see what it looks like to add partitioning
>> explicitly in cases where the default was being relied on.
>> http://gerrit.cloudera.org:8080/#/c/3131/
>>
>> - Dan
>>
>
>
>
> --
> Abhi Basu
>
Mime
View raw message