Hi Dan
So will Kudu rebalance tablets on different disks dynamically when some gets bigger than other, in order not to fill a disk while the other disk space could remain mostly free ?

2017-03-02 18:26 GMT+01:00 Dan Burkert <danburkert@apache.org>:
Hi Alexandre,

responses inline

On Thu, Mar 2, 2017 at 9:18 AM, Alexandre Fouché <afouche@onfocus.io> wrote:

When storing data on multiple JBOD disks, will Kudu assign data for tablets efficiently as far as tablet sizes or activity are concerned, or will it simply try to assign roughly the same number of tablets on each disk, regarless of tablets true size or activity (we have many empty tablets at this time). And will it rebalance tablets to one disk or another automatically ? Or is it still better to expose one RAID0 volume ?

Kudu will evenly spread data from all tablets across all disks.  This allows Kudu to get good write throughput and balancing, but similarly to RAID 0 it means that if one drive fails, all tablets on that tablet server will become unavailable. Kudu will automatically recover by re-replicating the tablets to a different tablet server as long as a majority of replicas are still available.  So, RAID 0 should provide no benefit for Kudu.  It's on the roadmap to make multi-disk configuration more flexible so that if a single disk dies only a subset of the tablets will become unavailable, but I don't have a timeline on that feature (no one is working on it to my knowledge).
 
Will using JBOD disks better than RAID stripes ? It seems from Bug reports that when WAL disk fails, or one of the JBOD data disks, Kudu is still unable to recover and keep or migrate good tablets. In that case, it shows no improvement over a failed disk on a RAID0 where in both cases the only recover option is to delete the whole Kudu data and WAL and let it resync from other nodes ?

I think what I wrote previously answers this, if not I can clarify.

I found comments that WAL can only be one one disk, is it still the case, or is this info obsolete ?

This is currently the case. If you have many disks it's often advantageous to put the WAL on it's own disk (ideally an SSD if it's available). The WAL workload is more latency sensitive than the data workload.

- Dan