kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: best practices to remove/retire data
Date Thu, 12 May 2016 15:24:29 GMT
Hi,

Right now this use case is more difficult than it needs to be. In your
previous thread, "Partition and Split rows", we talked about non-covering
range partition and this is something that would help your use case a lot.
Basically, you could create partitions that cover full days, and everyday
you could delete the old partitions while creating the next day's. Deleting
a partition is really quick and efficient compared to manually deleting
individual rows.

Until this is available I'd do this with multiple table, but it's a mess to
handle as you described.

Hope this helps,

J-D

On Thu, May 12, 2016 at 8:16 AM, Sand Stone <sand.m.stone@gmail.com> wrote:

> Hi. Presumably I need to write a program to delete the unwanted rows, say,
> remove all data older than 3 days, while the table is still ingesting new
> data.
>
> How well will this perform for large tables? Both deletion and ingestion
> wise.
>
> Or for this specific case that I retire data by day, I should create a new
> table per day. However then the users have to be aware of the table naming
> scheme somehow. If a mention policy is changed. all the client side code
> might have to change (sure we can have one level of indirection to minimize
> the pain).
>
> Thanks.
>

Mime
View raw message