kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <...@cloudera.com>
Subject Re: Proposal: remove default partitioning for new tables
Date Thu, 26 May 2016 22:53:31 GMT
Hi all,

Thanks for the feedback!  We've made this change, and it will be part of
the upcoming 0.9 release.  Going forward, all create table calls must have
partitioning specified.  Existing tables will not be affected.

- Dan

On Fri, May 20, 2016 at 6:41 AM, Jordan Birdsell <
jordan.birdsell.kdvm@statefarm.com> wrote:

> +1 ...this is a great recommendation
>
> -----Original Message-----
> From: Sand Stone [mailto:sand.m.stone@gmail.com]
> Sent: Thursday, May 19, 2016 10:39 PM
> To: user@kudu.incubator.apache.org
> Cc: dev@kudu.incubator.apache.org
> Subject: Re: Proposal: remove default partitioning for new tables
>
> Agreed that this is a sensible API change.
>
> On Thu, May 19, 2016 at 4:07 PM, Abhi Basu <9000revs@gmail.com> wrote:
>
> > I think this a very reasonable feature request. I have recently started
> > working with Kudu and the "default" behavior has already tripped me up a
> > couple times.
> >
> > Thanks,
> >
> > Abhi
> >
> > On Thu, May 19, 2016 at 4:03 PM, Dan Burkert <danburkert@apache.org>
> > wrote:
> >
> >> Hi all,
> >>
> >> One of the issues that trips up new Kudu users is the uncertainty about
> >> how partitioning works, and how to use partitioning effectively.  Much
> of
> >> this can be addressed with better documentation and explanatory
> materials,
> >> and that should be an area of focus leading up to our 1.0 release.
> However,
> >> the default partitioning behavior is suboptimal, and changing the
> default
> >> could lead to significantly less user confusion and frustration.
> Currently,
> >> when creating a new table, Kudu defaults to using only a single tablet,
> >> which is a known anti-pattern.  This can be painful for users who
> create a
> >> table assuming Kudu will have good defaults, and begin loading data
> only to
> >> find out later that they will need to recreate the table with
> partitioning
> >> to achieve good results.
> >>
> >> A better default partitioning strategy might be hash partitioning over
> >> the primary key columns, with a number of hash buckets based on the
> number
> >> of tablet servers (perhaps something like 3x the number of tablet
> >> servers).  This would alleviate the worst scalability issues with the
> >> current default, however it has a few downsides of its own. Hash
> >> partitioning is not appropriate for every use case, and any
> rule-of-thumb
> >> number of tablets we could come up with will not always be optimal.
> >>
> >> Given that there is no bullet-proof default, and that changing
> >> partitioning strategy after table creation is impossible, and changing
> the
> >> default partitioning strategy is a backwards incompatible change, I
> propose
> >> we remove the default altogether.  Users would be required to explicitly
> >> specify the table partitioning during creation, and failing to do so
> would
> >> result in an illegal argument error.  Users who really do want only a
> >> single tablet will still be able to do so by explicitly configuring
> range
> >> partitioning with no split rows.
> >>
> >> I'd like to get community feedback on whether this seems like a good
> >> direction to take.  I have put together a patch, you can check out the
> >> changes to test files to see what it looks like to add partitioning
> >> explicitly in cases where the default was being relied on.
> >> http://gerrit.cloudera.org:8080/#/c/3131/
> >>
> >> - Dan
> >>
> >
> >
> >
> > --
> > Abhi Basu
> >
>

Mime
View raw message