trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leif Hedstrom <zw...@apache.org>
Subject Re: iowait
Date Thu, 14 Jul 2016 21:32:36 GMT

> On Jul 14, 2016, at 1:18 PM, Reindl Harald <h.reindl@thelounge.net> wrote:
> 
> 
> 
> Am 14.07.2016 um 20:54 schrieb Leif Hedstrom:
>> 
>>> On Jul 14, 2016, at 12:11 PM, Randeep <randeep123@gmail.com> wrote:
>>> 
>>> Thanks Harald.
>>> 
>>> I'll check the raid configuration.
>> 
>> We recommend not using RAID for the ATS cache, it’s desirable to let ATS deal with
that. I.e. just give it some JBOD’s. There can be some cases where you might want to RAID,
but I personally do not.
> 
> when you have a single cache disk which is overloaded with IOPS you have a problem you
can only solve with more disks
> 
> without RAID in case of failed disks your storage space for cache goes down and if you
would not need the cache size why would you configure it at all?
> 
> so *what* is the point against redundancy with improved performance?


Not to start a religious war here (but I probably just did), but there are plenty of reasons
where you would not want to use RAID with ATS (or any caching proxy). For example (top of
my head):

* RAID 5/6 is generally slower than JBOD, and certainly reduces the available capacity
* RAID 10 is very fast, at the expense of lost capacity (50%)
* RAID 0 would be ill advised for ATS, because if you lose one drive, presumably you lose
your entire stripe. And ATS has no way of dealing with that, so the entire cache is lost.
* There’s likely overhead if you use software RAID, although I have no real numbers for
this. I’d assume HW RAID is fine.


ATS can manage devices on its own, removing failed disks, and continue running with those
that remains. In general, I find this preferable;  I rather run with e.g. 6 drives *most*
of the time, and deal with reduced performance/capacity for the error case where I lose one
drive (while ATS refetches). The alternative being (with RAID) to always run with reduced
capacity (in the case of RAID 5/6 and RAID 10), or always reduced performance (RAID 5/6).

That much said, I can see several very legitimate use cases and reasons to use RAID with ATS:

* Ease of administration (as you point out, make the problem be someone else’s problem).
* You might see better performance with e.g. RAID10, particularly on read’s, for the case
where a large object is fetched very often from disk. This is because an ATS object never
spans more than one physical drive, so RAID here can allow you to distribute that object across
more than one physical disk.
* Your data truly has to be resilient, such that losing a drive is seriously crippling. I
dislike this, because a cache is a cache, if you depend on the content to always be there,
it’s no longer a cache, it’s a file server.

Cheers,

— leif


Mime
View raw message