kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mauricio Aristizabal <mauri...@impact.com>
Subject Best way to merge range partitions
Date Tue, 30 Apr 2019 20:22:52 GMT
I'm doing the delicate dance of maximizing ingest by having enough current
hash partitions (say 25), minimizing query runtime by having range
partitions that roughly match most report runs (say 2 weeks), while keeping
tablet count not far above the 600 recommended, and supporting at least 18
months of data.

I'm thinking of a strategy of routinely merging older cold data range
partitions into bigger ones (say 2 months instead of 2 weeks), and leverage
the reduced overall tablet count to increase the hash buckets.

It would be really nice if there was a Kudu CLI 'merge_range_partition'
command (ranges would need to be contiguous).  It would greatly simplify
optimization of time-series data structures.

So instead i'm planning on copying the range partitions' data to a parquet
side table, dropping the partitions, creating a single one, and copying the
data back in.

Any better approach I can use currently?

Using CDH 5.15 Impala 2.13 Kudu 1.7

Thanks in advance,

-m

-- 
Mauricio Aristizabal
Architect - Data Pipeline
mauricio@impact.com | 323 309 4260
https://impact.com
<https://www.linkedin.com/company/impact-martech/>
<https://www.facebook.com/ImpactMarTech/>
<https://twitter.com/impactmartech>
<https://www.youtube.com/c/impactmartech>

Mime
View raw message