hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Reiter <a.rei...@web.de>
Subject hadoop / hbase /zookeeper architecture for best performance
Date Tue, 21 Jun 2011 08:43:38 GMT
hi folks,

at the moment our architecture looks like this:

the cluster has 4 servers:
  - server1: namenode, secondary namenode, jobtracker, hbase master
  - server2: datanode, tasktracker, hbase regionserver, zookeeper1
  - server3: datanode, tasktracker, hbase regionserver, zookeeper2
  - server4: datanode, tasktracker, hbase regionserver, zookeeper3

  - Linux version 2.6.26-2-amd64 (Debian 2.6.26-25lenny1)
  - hadoop-0.20.2-CDH3B4
  - hbase-0.90.1-CDH3B4
  - zookeeper-3.3.2-CDH3B4

  - CPU: 2x AMD Opteron(tm) Processor 250 (2.4GHz)
  - disk: 500 GB, software raid raid1 (2x WDC WD5000AAKB-00H8A0, ATA DISK drive)
  - memory: 2 GB
  - network: 1 Gbps Ethernet

at the moment the servers are shared for hdfs / mapreduce / hbase / zookeeper  as you can
i can not imagine, that this is the best practice...

it is not a problem to add further servers, but what is an proper architecture, to achieve
the best performance?
is that ok, that hdfs (data nodes) and tasktracker (mapreduce) are running on the same servers?

should zookeeper ensamble be running on extra dedicated servers, what is about region servers?

what is about memory?
how many memory should each server have (hdfs, mapred, hbase, zoo)?

the MR jobs on our Hbase table are running far too slow... RowCounter is running about 13
minutes for 3249727 rows, thats just inacceptable


View raw message