hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mingjian Deng <koven2...@gmail.com>
Subject Re: hadoop / hbase /zookeeper architecture for best performance
Date Tue, 21 Jun 2011 09:18:51 GMT
Hi andre:
    I think datanode and regionserver is in the same node is better. Because
it will useful for locality.
    zookeeper is lightly, so you can share them in your cluster with other
processes. But in my opinion, you can running them in an extra cluster if
there are more than 1 hbase cluster use the same zookeeper.

    In our cluster, we have 43 nodes with 3 clusters. Two of them have 10
regionservers and datanodes nodes and another has 20 regionservers and
datanodes. And all the 3 clusters use 1 zookeeper cluster with 5 nodes
running in virtual machines.

    Each machine in the 43 nodes have 12 disks, one is 1TB.
    Memory: Datanode: 2GB HMaster:8GB Regionserver: 16GB. Zookeeper: 1GB.

    The performance is pretty good.

2011/6/21 Andre Reiter <a.reiter@web.de>

> hi folks,
>
> at the moment our architecture looks like this:
>
> the cluster has 4 servers:
>  - server1: namenode, secondary namenode, jobtracker, hbase master
>  - server2: datanode, tasktracker, hbase regionserver, zookeeper1
>  - server3: datanode, tasktracker, hbase regionserver, zookeeper2
>  - server4: datanode, tasktracker, hbase regionserver, zookeeper3
>
> versions:
>  - Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
>  - hadoop-0.20.2-CDH3B4
>  - hbase-0.90.1-CDH3B4
>  - zookeeper-3.3.2-CDH3B4
>
> hardware:
>  - CPU: 2x AMD Opteron(tm) Processor 250 (2.4GHz)
>  - disk: 500 GB, software raid raid1 (2x WDC WD5000AAKB-00H8A0, ATA DISK
> drive)
>  - memory: 2 GB
>  - network: 1 Gbps Ethernet
>
> at the moment the servers are shared for hdfs / mapreduce / hbase /
> zookeeper  as you can see
> i can not imagine, that this is the best practice...
>
> it is not a problem to add further servers, but what is an proper
> architecture, to achieve the best performance?
> is that ok, that hdfs (data nodes) and tasktracker (mapreduce) are running
> on the same servers?
>
> should zookeeper ensamble be running on extra dedicated servers, what is
> about region servers?
>
> what is about memory?
> how many memory should each server have (hdfs, mapred, hbase, zoo)?
>
> the MR jobs on our Hbase table are running far too slow... RowCounter is
> running about 13 minutes for 3249727 rows, thats just inacceptable
>
> regards
> andre
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message