hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Friso van Vollenhoven <fvanvollenho...@xebia.com>
Subject Re: operations, how 'hard' is it to keep the servers humming
Date Wed, 21 Jul 2010 21:53:38 GMT
Before I joined my current project and team, my colleagues did some trials with HBase on ec2.
Apart from the financial side of things, I have been told that the internode networking is
not what you'd hope for, both in terms of latency and bandwidth. Depending on your cluster
size, experiencing high latency can really become a problem (because zookeeper decisions will
take more time). For Hadoop this should be less of a problem, because you'd typically deal
with s3 storage and leave the lower level details of persistence up to the provider.

Amazon is nowadays offering a special type of instance which actually has 10GBit internode
networking, but this comes at a price, of course...

Like I said, I did not experience these thing myself, but I trust my colleagues on this. When
doing trials, I think you should do some benchmarking on the networking before relying on
it...

Right now we run a six node cluster for development (primary and secondary master node and
four worker nodes). We have experienced network outages, processes crashing or shutting down
(due to the network outages) and other unanticipated software errors. However, in all cases
the cluster remained operational enough to respond to requests, so the problems have never
been blocking (we setup notifications, error reporting, etc. like we would in a production
environment to get familiar with that side of running a cluster during development; I advise
anyone starting out with this kind of setup to do this as well). Generally, all the bad things
could safely be attended next day or on monday when they happened at night or during the weekend.
So in conclusion I would say managing a small cluster is not that much of a hassle (say up
to 15 nodes or so). Making it too small obviously brings the disadvantage of less redundancy
(losing one worker node of a four node cluster means losing 25% capacity, while losing a node
in a larger cluster has less impact, while the chance of failure per node is just the same).
Also, you will have to put some effort into making sure that all of the tasks that you need
to do frequently are automated or at least can be done by a single command. If you find yourself
SSH'ing into more than one machine, you're doing something wrong.

We run on server grade hardware, not junk. RAIDed OS disks (not the data disks), dual power
supply, etc. The master nodes we will use for production will come with battery backed RAID
controllers (the kind that disable their write cache if they don't trust the battery level
anymore) and there is highly available network storage option to also store namenode data.
I could not think of an argument for running on junk in a business setting, unless you have
absolutely nothing more than a 'reasonable effort' requirement. Getting decent hardware will
always be cheaper when you have to take cost for labour into consideration. In the end availability
requirements will also determine the amount of work that goes into this.


Friso





On 21 jul 2010, at 22:12, Hegner, Travis wrote:

> The biggest issue you'll likely have is hardware, so if you are running ec2, that is
out the window. I run my datanodes on 'old' desktop grade hardware... Single Power Supply,
2GB RAM, Single HT P4 procs, and single 250GB disks. I know, it's bad, but for my current
purposes, it works pretty well. Once the cluster is up and running, and I'm not changing configs
and constantly restarting, it will run for weeks without intervention.
> 
> If you run on server grade hardware, built to tighter specs, the chances of failure (therefore
intervention for repair or replacement) are lower.
> 
> If you run on ec2, then someone else is dealing with the hardware, and you can just use
the cluster...
> 
> Travis Hegner
> http://www.travishegner.com/
> 
> -----Original Message-----
> From: S Ahmed [mailto:sahmed1020@gmail.com]
> Sent: Wednesday, July 21, 2010 3:36 PM
> To: user@hbase.apache.org
> Subject: Re: operations, how 'hard' is it to keep the servers humming
> 
> Can you define what you mean by 'complete junk' ?
> 
> I plan on using ec2.
> 
> On Wed, Jul 21, 2010 at 3:23 PM, Hegner, Travis <THegner@trilliumit.com>wrote:
> 
>> That question is completely dependant on the size of the cluster you are
>> looking at setting up, which is then dependant on how much data you want to
>> store and/or process.
>> 
>> A one man show should be able to handle 10-20 machines without too much
>> trouble, unless you run complete junk. I run a 6 node cluster on complete
>> junk, and I rarely have had to tinker with it since setting it up.
>> 
>> Travis Hegner
>> http://www.travishegner.com/
>> 
>> -----Original Message-----
>> From: S Ahmed [mailto:sahmed1020@gmail.com]
>> Sent: Wednesday, July 21, 2010 2:59 PM
>> To: user@hbase.apache.org
>> Subject: operations, how 'hard' is it to keep the servers humming
>> 
>> From a operations standpoint, is setup a hbase cluster and keeping them
>> running a fairly complex task?
>> 
>> i.e. if I am a 1-man show, would it be a smart choice to build on top of
>> hbase or is a crazy idea?
>> 
>> The information contained in this communication is confidential and is
>> intended only for the use of the named recipient.  Unauthorized use,
>> disclosure, or copying is strictly prohibited and may be unlawful.  If you
>> have received this communication in error, you should know that you are
>> bound to confidentiality, and should please immediately notify the sender or
>> our IT Department at  866.459.4599.
>> 
> 
> The information contained in this communication is confidential and is intended only
for the use of the named recipient.  Unauthorized use, disclosure, or copying is strictly
prohibited and may be unlawful.  If you have received this communication in error, you should
know that you are bound to confidentiality, and should please immediately notify the sender
or our IT Department at  866.459.4599.


Mime
View raw message