kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Berkeley <wdberke...@cloudera.com>
Subject Re: Best way to merge range partitions
Date Wed, 01 May 2019 19:55:18 GMT
Where's the 600 tablet count recommendation sourced from? Is that
pre-replication and per-tserver, so there's 1800 replicas per tablet
server? We recommend 1000-2000 replicas per server.

As for your strategy for merging range partitions, I think it's the best
available at this point.

-Will

On Tue, Apr 30, 2019 at 1:23 PM Mauricio Aristizabal <mauricio@impact.com>
wrote:

> I'm doing the delicate dance of maximizing ingest by having enough current
> hash partitions (say 25), minimizing query runtime by having range
> partitions that roughly match most report runs (say 2 weeks), while keeping
> tablet count not far above the 600 recommended, and supporting at least 18
> months of data.
>
> I'm thinking of a strategy of routinely merging older cold data range
> partitions into bigger ones (say 2 months instead of 2 weeks), and leverage
> the reduced overall tablet count to increase the hash buckets.
>
> It would be really nice if there was a Kudu CLI 'merge_range_partition'
> command (ranges would need to be contiguous).  It would greatly simplify
> optimization of time-series data structures.
>
> So instead i'm planning on copying the range partitions' data to a parquet
> side table, dropping the partitions, creating a single one, and copying the
> data back in.
>
> Any better approach I can use currently?
>
> Using CDH 5.15 Impala 2.13 Kudu 1.7
>
> Thanks in advance,
>
> -m
>
> --
> Mauricio Aristizabal
> Architect - Data Pipeline
> mauricio@impact.com | 323 309 4260
> https://impact.com
> <https://www.linkedin.com/company/impact-martech/>
> <https://www.facebook.com/ImpactMarTech/>
> <https://twitter.com/impactmartech>
> <https://www.youtube.com/c/impactmartech>
>

Mime
View raw message