kudu-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Burkert <danburk...@apache.org>
Subject Re: stripes, JBOD: Assignment and rebalance ?
Date Thu, 02 Mar 2017 17:26:48 GMT
Hi Alexandre,

responses inline

On Thu, Mar 2, 2017 at 9:18 AM, Alexandre Fouché <afouche@onfocus.io> wrote:
> When storing data on multiple JBOD disks, will Kudu assign data for
> tablets efficiently as far as tablet sizes or activity are concerned, or
> will it simply try to assign roughly the same number of tablets on each
> disk, regarless of tablets true size or activity (we have many empty
> tablets at this time). And will it rebalance tablets to one disk or another
> automatically ? Or is it still better to expose one RAID0 volume ?

Kudu will evenly spread data from all tablets across all disks.  This
allows Kudu to get good write throughput and balancing, but similarly to
RAID 0 it means that if one drive fails, all tablets on that tablet server
will become unavailable. Kudu will automatically recover by re-replicating
the tablets to a different tablet server as long as a majority of replicas
are still available.  So, RAID 0 should provide no benefit for Kudu.  It's
on the roadmap to make multi-disk configuration more flexible so that if a
single disk dies only a subset of the tablets will become unavailable, but
I don't have a timeline on that feature (no one is working on it to my

> Will using JBOD disks better than RAID stripes ? It seems from Bug reports
> that when WAL disk fails, or one of the JBOD data disks, Kudu is still
> unable to recover and keep or migrate good tablets. In that case, it shows
> no improvement over a failed disk on a RAID0 where in both cases the only
> recover option is to delete the whole Kudu data and WAL and let it resync
> from other nodes ?

I think what I wrote previously answers this, if not I can clarify.

I found comments that WAL can only be one one disk, is it still the case,
> or is this info obsolete ?

This is currently the case. If you have many disks it's often advantageous
to put the WAL on it's own disk (ideally an SSD if it's available). The WAL
workload is more latency sensitive than the data workload.

- Dan

View raw message