Yep: https://issues.apache.org/jira/browse/KUDU-1903

- Dan

On Tue, Feb 28, 2017 at 12:51 PM, Todd Lipcon <todd@cloudera.com> wrote:
Hey Dan,

Mind filing a critical or blocker JIRA against 1.3 so we can track remaining things that should go into the branch before release?

-Todd

On Tue, Feb 28, 2017 at 10:05 AM, Dan Burkert <danburkert@apache.org> wrote:
Hey Paul,

Thanks for checking that out and following up.  I'm going to try and root cause this today so that we have plenty of time to get a fix in to 1.3 if it requires one.   Thanks again for the report. In the meantime, let me know if the alter table workaround is not enough for you to make progress with Kudu.

-Dan


On Mon, Feb 27, 2017 at 3:02 PM Paul Brannan <paul.brannan@thesystech.com> wrote:
One side-effect of neglecting to drop the unbounded range partition: I get a stack trace when I try to scan:

F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it != collection.end() Map key not found: ▒3
*** Check failure stack trace: ***
    @     0x7fca2a5506ad  (unknown)
    @     0x7fca2a55271c  (unknown)
    @     0x7fca2a550209  (unknown)
    @     0x7fca2a5530af  (unknown)
    @     0x7fca2a3de482  (unknown)
    @     0x7fca2a3dae70  (unknown)
    @     0x7fca2a3dc100  (unknown)
    @     0x7fca2a429a44  (unknown)
    @     0x7fca2a42ab47  (unknown)
    @     0x7fca2a42e94c  (unknown)
    @     0x7fca2a43081c  (unknown)
    @     0x7fca2a5a9a56  (unknown)
    @     0x7fca2a5aa948  (unknown)
    @     0x7fca2a41ac8b  (unknown)
    @     0x7fca2a4dcfc8  (unknown)
    @     0x7fca290d6182  start_thread
    @     0x7fca2980947d  clone
    @              (nil)  (unknown)


On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan <paul.brannan@thesystech.com> wrote:
Is that 4TB per tablet server, regardless of how many tablets it has?

If I have 128GB of data per day, then each tablet server hits the recommended limit after about a month.  To store 10 years of data, I would need 120 tablet servers to avoid going over the limit.  Is that the best solution or is there another alternative?

How many cores are recommended per tablet server?  If I typically only scan one day of data at time, could a single core service multiple tablet servers?


On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <paul.brannan@thesystech.com> wrote:
The test doesn't exactly reproduce what I did in my sample program.

I'm able to successfully drop the unbounded partition in both cases (calling set_range_partition_columns only vs calling set_range_partition_columns+add_hash_partitions).  However, if I omit the call to DropRangePartition, then AddRangePartition succeeds in the first case and fails in the second case.  I expect it to succeed in both cases or fail in both cases.

I've attached a simple program which demonstrates.


On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <danburkert@apache.org> wrote:
Hi Paul,

I can't reproduce the behavior you are describing, I always get a single unbounded range partition when creating the table without specifying range bounds or splits (regardless of hash partitioning). I searched and couldn't find a unit test for this behavior, so I wrote one - you might compare your code against my test. https://gerrit.cloudera.org/#/c/6153/

Thanks,
Dan

On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <paul.brannan@thesystech.com> wrote:
I can verify that dropping the unbounded range partition allows me to later add bounded partitions.

If I only have range partitioning (by commenting out the call to add_hash_partitions), adding a bounded partition succeeds, regardless of whether I first drop the unbounded partition.  This seems surprising; why the difference?

On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <danburkert@apache.org> wrote:
Hi Paul,

I think the issue you are running into is that if you don't add a range partition explicitly during table creation (by calling add_range_partition or inserting a split with add_range_partition_split), Kudu will default to creating 1 unbounded range partition.  So your two options are to add the range partition during table creation time, or if you only know that partition you want at a later time, you can drop the existing partition (alterer->DropRangePartition with two empty rows), then add the range partition.  Note that dropping the range partition will effectively truncate the table.  This can be done with the same alterer in a single transaction.  If you want to see a bunch of examples, you can check out this unit test: https://github.com/apache/kudu/blob/master/src/kudu/integration-tests/alter_table-test.cc#L1106.

- Dan

On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <paul.brannan@thesystech.com> wrote:
I'm trying to create a table with one-column range-partitioned and another column hash-partitioned.  Documentation for add_hash_partitions and set_range_partition_columns suggest this should be possible ("Tables must be created with either range, hash, or range and hash partitioning").

I have a schema with three INT64 columns ("time", "key", and "value").  When I create the table, I set up the partitioning:

(*table_creator)
  .table_name("test_table")
  .schema(&schema)
  .add_hash_partitions({"key"}, 2)
  .set_range_partition_columns({"time"})
  .num_replicas(1)
  .Create()

I later try to add a partition:

auto timesplit(KuduSchema & schema, std::int64_t t) {
  auto split = schema.NewRow();
  check_ok(split->SetInt64("time", t));
  return split;
}

alterer->AddRangePartition(
  timesplit(schema, date_start),
  timesplit(schema, next_date_start));

check_ok(alterer->Alter());

But I get an error "Invalid argument: New range partition conflicts with existing range partition".

How are hash and range partitioning intended to be mixed?










--
Todd Lipcon
Software Engineer, Cloudera