kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <danburk...@apache.org>
Subject Re: mixing range and hash partitioning
Date Tue, 28 Feb 2017 18:05:05 GMT
Hey Paul,

Thanks for checking that out and following up.  I'm going to try and root
cause this today so that we have plenty of time to get a fix in to 1.3 if
it requires one.   Thanks again for the report. In the meantime, let me
know if the alter table workaround is not enough for you to make progress
with Kudu.

-Dan


On Mon, Feb 27, 2017 at 3:02 PM Paul Brannan <paul.brannan@thesystech.com>
wrote:

One side-effect of neglecting to drop the unbounded range partition: I get
a stack trace when I try to scan:

F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it !=
collection.end() Map key not found: ▒3
*** Check failure stack trace: ***
    @     0x7fca2a5506ad  (unknown)
    @     0x7fca2a55271c  (unknown)
    @     0x7fca2a550209  (unknown)
    @     0x7fca2a5530af  (unknown)
    @     0x7fca2a3de482  (unknown)
    @     0x7fca2a3dae70  (unknown)
    @     0x7fca2a3dc100  (unknown)
    @     0x7fca2a429a44  (unknown)
    @     0x7fca2a42ab47  (unknown)
    @     0x7fca2a42e94c  (unknown)
    @     0x7fca2a43081c  (unknown)
    @     0x7fca2a5a9a56  (unknown)
    @     0x7fca2a5aa948  (unknown)
    @     0x7fca2a41ac8b  (unknown)
    @     0x7fca2a4dcfc8  (unknown)
    @     0x7fca290d6182  start_thread
    @     0x7fca2980947d  clone
    @              (nil)  (unknown)


On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan <paul.brannan@thesystech.com>
wrote:

Is that 4TB per tablet server, regardless of how many tablets it has?

If I have 128GB of data per day, then each tablet server hits the
recommended limit after about a month.  To store 10 years of data, I would
need 120 tablet servers to avoid going over the limit.  Is that the best
solution or is there another alternative?

How many cores are recommended per tablet server?  If I typically only scan
one day of data at time, could a single core service multiple tablet
servers?


On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <paul.brannan@thesystech.com>
wrote:

The test doesn't exactly reproduce what I did in my sample program.

I'm able to successfully drop the unbounded partition in both cases
(calling set_range_partition_columns only vs calling
set_range_partition_columns+add_hash_partitions).  However, if I omit the
call to DropRangePartition, then AddRangePartition succeeds in the first
case and fails in the second case.  I expect it to succeed in both cases or
fail in both cases.

I've attached a simple program which demonstrates.


On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <danburkert@apache.org> wrote:

Hi Paul,

I can't reproduce the behavior you are describing, I always get a single
unbounded range partition when creating the table without specifying range
bounds or splits (regardless of hash partitioning). I searched and couldn't
find a unit test for this behavior, so I wrote one - you might compare your
code against my test. https://gerrit.cloudera.org/#/c/6153/

Thanks,
Dan

On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <paul.brannan@thesystech.com>
wrote:

I can verify that dropping the unbounded range partition allows me to later
add bounded partitions.

If I only have range partitioning (by commenting out the call to
add_hash_partitions), adding a bounded partition succeeds, regardless of
whether I first drop the unbounded partition.  This seems surprising; why
the difference?

On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <danburkert@apache.org> wrote:

Hi Paul,

I think the issue you are running into is that if you don't add a range
partition explicitly during table creation (by calling add_range_partition
or inserting a split with add_range_partition_split), Kudu will default to
creating 1 unbounded range partition.  So your two options are to add the
range partition during table creation time, or if you only know that
partition you want at a later time, you can drop the existing partition
(alterer->DropRangePartition with two empty rows), then add the range
partition.  Note that dropping the range partition will effectively
truncate the table.  This can be done with the same alterer in a single
transaction.  If you want to see a bunch of examples, you can check out
this unit test:
https://github.com/apache/kudu/blob/master/src/kudu/integration-tests/alter_table-test.cc#L1106
.

- Dan

On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <paul.brannan@thesystech.com>
wrote:

I'm trying to create a table with one-column range-partitioned and another
column hash-partitioned.  Documentation for add_hash_partitions and
set_range_partition_columns suggest this should be possible ("Tables must
be created with either range, hash, or range and hash partitioning").

I have a schema with three INT64 columns ("time", "key", and "value").
When I create the table, I set up the partitioning:

(*table_creator)
  .table_name("test_table")
  .schema(&schema)
  .add_hash_partitions({"key"}, 2)
  .set_range_partition_columns({"time"})
  .num_replicas(1)
  .Create()

I later try to add a partition:

auto timesplit(KuduSchema & schema, std::int64_t t) {
  auto split = schema.NewRow();
  check_ok(split->SetInt64("time", t));
  return split;
}

alterer->AddRangePartition(
  timesplit(schema, date_start),
  timesplit(schema, next_date_start));

check_ok(alterer->Alter());

But I get an error "Invalid argument: New range partition conflicts with
existing range partition".

How are hash and range partitioning intended to be mixed?

Mime
View raw message