hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Barat <vba...@ubikod.com>
Subject RE: Smallest production HBase cluster
Date Fri, 23 Jul 2010 07:57:32 GMT
We run a similar 3 nodes cluster on AWS large instances (8GB ram). We do constant small writes
into hbase (for logging) and constantly run m/r jobs at the same time on the same nodes (using
pig). Once our regionservers have enough ram (-Xmx2048m in our case) they stay stable. This
cluster has not failed since 6 months now, but hbase is not heavy loaded (only constant writes
and sequential reads).
Actually, because pig 's hbase loader is too slow, we first copy all logs into regular hdfs
files before running m/r jobs. This greatly reduce the load on hbase. This also could allow
us to separate the storage cluster from the m/r cluster, which can be a good idea provided
that they don't scale the same way, but is a bad idea if your data are hudge.
Finally, even if I like the product a lot, I must say that hbase is THE MOST UNSTABLE PIECE
of our backend. We never had any trouble with hdfs, m/r or pig, but we had LOTS of difficulties
managing and tuning HBase the right way: there is definitively some work to do on reducing
memory usage and increasing fiability.
We lost all of our data once because of a crash that lead to inconsistant data structure,
but it was with hbase 0.20.2.
My position is that if hbase could be used on small nodes (2gb ram) reliably it would be the
perfect product :-)

Geoff Hendrey <ghendrey@decarta.com> a écrit :

>I am running a 3 node cluster. HDFS datanode and Hbase regionserver are
>running on each node. The Hbase master and HDFS namenode run on
>different machines (not "different" in the sense of "not in the
>cluster". Just different in the sense of "not on the same box in the
>cluster").Quad core, 64-bit JVM, 32 GB RAM. 4 disk per machine. We had
>many troubles getting the cluster to stay alive when paired with an
>asymmetric (big) mapreduce cluster that was writing into Hbase.
>Ultimately, we achieved stability by disabling the WAL from code in our
>mapreduce jobs, and setting the Hfile block size lower than the default
>(we do a lot of random reads in the map phase). There are other tweaks
>that must be made, such as upping the OS file limit. I made a lot of
>posts in May, so you could look in the archive. At present, we're quite
>happy with the cluster.
>-----Original Message-----
>From: Paul Smith [mailto:psmith@aconex.com] 
>Sent: Thursday, July 22, 2010 3:56 PM
>To: user@hbase.apache.org
>Subject: Smallest production HBase cluster
>anyone able to share their experience, thoughts on the 'smallest'
>production HBase cluster in operation?    Thinking there may be some
>point in the # Nodes scale where one transitions from/to "that's silly"
>to "that's actually more like it".
>Anyone out there with a small HBase cluster in operation with < 10 nodes
>able to share any information?
>I notice on http://wiki.apache.org/hadoop/Hbase/PoweredBy there are some
>who have even just a 3 node cluster, perhaps that's out of date, but
>curious to know from the community on where people think 'the line'
>needs to be drawn on usage of Hbase.
>To take things to an extreme, is there anyone actually running a
>_single_ HBase node... ? (one would hope that machine is actually
>designed to be a bit more HA than normal) just to take advantage of a
>column-oriented store?
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message