kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <danburk...@apache.org>
Subject Proposal: remove default partitioning for new tables
Date Thu, 19 May 2016 23:03:33 GMT
Hi all,

One of the issues that trips up new Kudu users is the uncertainty about how
partitioning works, and how to use partitioning effectively.  Much of this
can be addressed with better documentation and explanatory materials, and
that should be an area of focus leading up to our 1.0 release. However, the
default partitioning behavior is suboptimal, and changing the default could
lead to significantly less user confusion and frustration. Currently, when
creating a new table, Kudu defaults to using only a single tablet, which is
a known anti-pattern.  This can be painful for users who create a table
assuming Kudu will have good defaults, and begin loading data only to find
out later that they will need to recreate the table with partitioning to
achieve good results.

A better default partitioning strategy might be hash partitioning over the
primary key columns, with a number of hash buckets based on the number of
tablet servers (perhaps something like 3x the number of tablet servers).
This would alleviate the worst scalability issues with the current default,
however it has a few downsides of its own. Hash partitioning is not
appropriate for every use case, and any rule-of-thumb number of tablets we
could come up with will not always be optimal.

Given that there is no bullet-proof default, and that changing partitioning
strategy after table creation is impossible, and changing the default
partitioning strategy is a backwards incompatible change, I propose we
remove the default altogether.  Users would be required to explicitly
specify the table partitioning during creation, and failing to do so would
result in an illegal argument error.  Users who really do want only a
single tablet will still be able to do so by explicitly configuring range
partitioning with no split rows.

I'd like to get community feedback on whether this seems like a good
direction to take.  I have put together a patch, you can check out the
changes to test files to see what it looks like to add partitioning
explicitly in cases where the default was being relied on.
http://gerrit.cloudera.org:8080/#/c/3131/

- Dan

Mime
View raw message