hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: why minimum 5 servers?
Date Wed, 14 Jul 2010 14:07:29 GMT
On Wed, Jul 14, 2010 at 6:24 AM, S Ahmed <sahmed1020@gmail.com> wrote:
> Is there a reason why 5 is the recommend number of servers (minimum) in a
> cluster?
>
> Why not 2 or 3?
>
> Just asking because 5 large ec2 instances (7.5gb ram) isn't *that* cheap :)
>
> Thanks!
>

Most of the answer is tied into the architecture of Hadoop. Normally
most set dfs.replication to 3. You are not going to want your namenode
to run on the same physical hardware as your DataNode, so that is
already 4. You may want a dedicated zookeeper and hbase master so that
is 5.

However performance wise if you dfs.replication = 3, at 3 nodes you do
not have that 'critical mass' of servers for the scale out effect. At
replication 3 and number of nodes 3 every action (put get) has some
affect on all servers. If you have replication 3 and 10 nodes a single
put or get only roughly effects 30% of your cluster. 3/100 3%...and so
on

Mime
View raw message