kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Limitations on total amount of data stored in one kudu table
Date Wed, 21 Mar 2018 17:34:39 GMT
On Tue, Mar 20, 2018 at 2:15 AM, Кравец Владимир Александрович <
kravets@kamatech.ru> wrote:

> Hi, I'm new to Kudu and I'm trying to understand the applicability for our
> purposes. So I met the following article about the kudu limitations -
> https://www.cloudera.com/documentation/enterprise/latest/topics/kudu_
> limitations.html#concept_cws_n4n_5z. Do I understand correctly that this
> means that the maximum total amount of usefull compressed stored data in
> one kudu-table  is 8TB? Here my calcs:
>

I think there are a few mistakes below. Comments lineline.


> 1. Amount of stored data per tablet = Recommended maximum amount of stored
> data / Recommended maximum number of tablets per tablet server = 8 000 / 2
> 000 = 4 GB per tablet
>

That assumes that every tablet is equally sized and that you have hit the
limit on number of tablets. Even though you _can_ have 2000 tablets per
server, you might want fewer. In addition, you don't need to have every
tablet be the same size -- some might be 10GB while others might be 1GB or
smaller.


> 2. Maximum number of tablets per table for each tablet server
> pre-replication = Maximum number of tablets per table for each tablet
> server is 60, post-replication / number of replicas = 60 / 3 = 20 tablets
> per table per tablet server
>

The key word that you didn't copy here is "at table-creation time". This
limitation has to do with avoiding some issues we have seen when trying to
create too many tablets at the same time on the cluster. With range
partitioning, you can always add more partitions later. For example it's
very common to add a new partition for each day. So, a single table can,
after some days, have more than 20 tablets on a given server.


> 3. Total amount of stored data per table, pre-replication = Amount of
> stored data per tablet * Maximum number of tablets per table for each
> tablet server pre-replication *  Maximum number of tablet servers = 4 GB *
> 20 * 100 = 8TB
>

Per above, this isn't really the case. For example, on one cluster at
Cloudera which runs an internal workload, we have one table that is 82TB
and another which is 46TB. I've seen much larger tables in some user
installations as well.


> And I also would like to understand how fundamental the nature of the
> limitation "Maximum number of tablets per table for each tablet server is
> 60, post-replication"? Is it possible that this restriction will be removed?
>

See above.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message