kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <danburk...@apache.org>
Subject Re: mixing range and hash partitioning
Date Tue, 28 Feb 2017 21:03:29 GMT
Yep: https://issues.apache.org/jira/browse/KUDU-1903

- Dan

On Tue, Feb 28, 2017 at 12:51 PM, Todd Lipcon <todd@cloudera.com> wrote:

> Hey Dan,
>
> Mind filing a critical or blocker JIRA against 1.3 so we can track
> remaining things that should go into the branch before release?
>
> -Todd
>
> On Tue, Feb 28, 2017 at 10:05 AM, Dan Burkert <danburkert@apache.org>
> wrote:
>
>> Hey Paul,
>>
>> Thanks for checking that out and following up.  I'm going to try and root
>> cause this today so that we have plenty of time to get a fix in to 1.3 if
>> it requires one.   Thanks again for the report. In the meantime, let me
>> know if the alter table workaround is not enough for you to make progress
>> with Kudu.
>>
>> -Dan
>>
>>
>> On Mon, Feb 27, 2017 at 3:02 PM Paul Brannan <paul.brannan@thesystech.com>
>> wrote:
>>
>> One side-effect of neglecting to drop the unbounded range partition: I
>> get a stack trace when I try to scan:
>>
>> F0227 15:00:12.696625 76369 map-util.h:112] Check failed: it !=
>> collection.end() Map key not found: ▒3
>> *** Check failure stack trace: ***
>>     @     0x7fca2a5506ad  (unknown)
>>     @     0x7fca2a55271c  (unknown)
>>     @     0x7fca2a550209  (unknown)
>>     @     0x7fca2a5530af  (unknown)
>>     @     0x7fca2a3de482  (unknown)
>>     @     0x7fca2a3dae70  (unknown)
>>     @     0x7fca2a3dc100  (unknown)
>>     @     0x7fca2a429a44  (unknown)
>>     @     0x7fca2a42ab47  (unknown)
>>     @     0x7fca2a42e94c  (unknown)
>>     @     0x7fca2a43081c  (unknown)
>>     @     0x7fca2a5a9a56  (unknown)
>>     @     0x7fca2a5aa948  (unknown)
>>     @     0x7fca2a41ac8b  (unknown)
>>     @     0x7fca2a4dcfc8  (unknown)
>>     @     0x7fca290d6182  start_thread
>>     @     0x7fca2980947d  clone
>>     @              (nil)  (unknown)
>>
>>
>> On Sun, Feb 26, 2017 at 6:53 PM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>
>> Is that 4TB per tablet server, regardless of how many tablets it has?
>>
>> If I have 128GB of data per day, then each tablet server hits the
>> recommended limit after about a month.  To store 10 years of data, I would
>> need 120 tablet servers to avoid going over the limit.  Is that the best
>> solution or is there another alternative?
>>
>> How many cores are recommended per tablet server?  If I typically only
>> scan one day of data at time, could a single core service multiple tablet
>> servers?
>>
>>
>> On Fri, Feb 24, 2017 at 11:22 PM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>
>> The test doesn't exactly reproduce what I did in my sample program.
>>
>> I'm able to successfully drop the unbounded partition in both cases
>> (calling set_range_partition_columns only vs calling
>> set_range_partition_columns+add_hash_partitions).  However, if I omit
>> the call to DropRangePartition, then AddRangePartition succeeds in the
>> first case and fails in the second case.  I expect it to succeed in both
>> cases or fail in both cases.
>>
>> I've attached a simple program which demonstrates.
>>
>>
>> On Fri, Feb 24, 2017 at 7:09 PM, Dan Burkert <danburkert@apache.org>
>> wrote:
>>
>> Hi Paul,
>>
>> I can't reproduce the behavior you are describing, I always get a single
>> unbounded range partition when creating the table without specifying range
>> bounds or splits (regardless of hash partitioning). I searched and couldn't
>> find a unit test for this behavior, so I wrote one - you might compare your
>> code against my test. https://gerrit.cloudera.org/#/c/6153/
>>
>> Thanks,
>> Dan
>>
>> On Fri, Feb 24, 2017 at 2:41 PM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>
>> I can verify that dropping the unbounded range partition allows me to
>> later add bounded partitions.
>>
>> If I only have range partitioning (by commenting out the call to
>> add_hash_partitions), adding a bounded partition succeeds, regardless of
>> whether I first drop the unbounded partition.  This seems surprising; why
>> the difference?
>>
>> On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert <danburkert@apache.org>
>> wrote:
>>
>> Hi Paul,
>>
>> I think the issue you are running into is that if you don't add a range
>> partition explicitly during table creation (by calling add_range_partition
>> or inserting a split with add_range_partition_split), Kudu will default to
>> creating 1 unbounded range partition.  So your two options are to add the
>> range partition during table creation time, or if you only know that
>> partition you want at a later time, you can drop the existing partition
>> (alterer->DropRangePartition with two empty rows), then add the range
>> partition.  Note that dropping the range partition will effectively
>> truncate the table.  This can be done with the same alterer in a single
>> transaction.  If you want to see a bunch of examples, you can check out
>> this unit test: https://github.com/apache/kudu/blob/master/src/kudu/
>> integration-tests/alter_table-test.cc#L1106.
>>
>> - Dan
>>
>> On Fri, Feb 24, 2017 at 10:53 AM, Paul Brannan <
>> paul.brannan@thesystech.com> wrote:
>>
>> I'm trying to create a table with one-column range-partitioned and
>> another column hash-partitioned.  Documentation for add_hash_partitions and
>> set_range_partition_columns suggest this should be possible ("Tables must
>> be created with either range, hash, or range and hash partitioning").
>>
>> I have a schema with three INT64 columns ("time", "key", and "value").
>> When I create the table, I set up the partitioning:
>>
>> (*table_creator)
>>   .table_name("test_table")
>>   .schema(&schema)
>>   .add_hash_partitions({"key"}, 2)
>>   .set_range_partition_columns({"time"})
>>   .num_replicas(1)
>>   .Create()
>>
>> I later try to add a partition:
>>
>> auto timesplit(KuduSchema & schema, std::int64_t t) {
>>   auto split = schema.NewRow();
>>   check_ok(split->SetInt64("time", t));
>>   return split;
>> }
>>
>> alterer->AddRangePartition(
>>   timesplit(schema, date_start),
>>   timesplit(schema, next_date_start));
>>
>> check_ok(alterer->Alter());
>>
>> But I get an error "Invalid argument: New range partition conflicts with
>> existing range partition".
>>
>> How are hash and range partitioning intended to be mixed?
>>
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Mime
View raw message