hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Carroll <phobos...@gmail.com>
Subject Re: Hbase cluster for serving real time site traffic
Date Thu, 01 Nov 2012 16:31:04 GMT
In production you would want 3, 5, or 7, etc... ZK's (Odd number) for
Quorum reasons. They should be dedicated on a machine, but it does not have
to be a very big one. Updated to ZK are applied to disk before they are in
memory for recoverability, so having faster disks helps once you start
getting more ZK traffic. Once you go to production 3 nodes should be fine (
http://zookeeper.apache.org/doc/r3.2.2/zookeeperOver.html#Implementation).

On Thu, Nov 1, 2012 at 1:01 AM, Varun Sharma <varun@pinterest.com> wrote:

> Thanks all for the helpful comments. I read up on HA and was wondering if
> there are good tools for setting up a HA HDFS + Hbase cluster on EC2
> quickly. From my reading, it appears that tools like Whirr still have
> issues with bringing up the secondary NN on a different machine etc. Also
> for availability, would Master-Slave replication or Master-Master
> replication be a substitute for having the secondary NN.
>
> For zookeeper, should the servers be running ZK only or is it fine to share
> with other services like the master ? Also, is it better to have a
> dedicated zookeeper cluster per hbase cluster ?
>
> Thanks
> Varun
>
> On Tue, Oct 30, 2012 at 1:20 PM, Marcos Ortiz <mlortiz@uci.cu> wrote:
>
> >  Regards, Varun, answers in line
> >
> > On 10/30/2012 01:03 PM, Varun Sharma wrote:
> >
> > Thanks for the tips.
> >
> > So, yes, secondary NameNode is probably more critical than the secondary
> > master - since the master is only responsible for metadata changes/region
> > splits/table creation etc and not for writes/reads.
> >
> >  Exactly, you have to create a good HA strategy for these nodes (Master
> > and Secondary Master)
> >
> >
> >  Regarding the keys question - i meant that the (row + column) length is
> > 24-32 bytes and the value length is 0-1 bytes. Currently, we have a
> cluster
> > running with all the data loaded into hbase but it all runs with default
> > settings.
> >
> >  There are many areas that you can optimize in a HBase cluster:
> > - Write operations
> > - Compactions and Split optimization
> > - Region Servers size
> > - Snappy compression
> > - Schema design
> > - Use of Block caching to Scan optimization
> > - Use of asynchronous clients for HBase operations (asynchbase for
> > example[1])
> > etc
> >
> > The excellent Lars's book: "HBase: The Definitive Guide" has a completed
> > chapter for this tricky topic (Chapter 11)
> >
> > Some additional resources:
> >
> > [1] https://github.com/stumbleupon/asynchbase
> > https://github.com/twitter/finagle
> > http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html
> > http://gbif.blogspot.com/2012/02/monitoring-hadoop-and-hbase.html
> > http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/
> >
> > Look at Slidehare all tagged presentations from the last HBaseCon, for
> > example the Benoit's talk about
> > "Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema
> Design":
> > http://www.slideshare.net/cloudera/tag/hbasecon-2012
> >
> > Best wishes
> >
> > Thanks
> > Varun
> >
> > On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
> >
> >
> >  My 2¢.
> >
> > 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
> > recommanded for production.
> > 2) Yes, you have Master and SecondaryMaster. And it's also recommanded
> > to have one of each. And the master is critical. If you are loosing
> > it, you are loosing your cluster.
> > 3) NameNode is hadoop, not hbase. You should follow hadoop
> > recommandations. Like you have secondarymaster, you have
> > secondarynamenode. So I think you should have as many
> > secondarynamenode as you have secondarymaster (on the same machine?).
> > 4) I'm not sure to understanding this question. Key are binary. Array
> > of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
> > This will only give you 2^32 different rows. You will have to
> > pre-split them, or you will end with almost all of them on the same
> > regionserver?
> >
> > JM
> >
> > 2012/10/30, Varun Sharma <varun@pinterest.com> <varun@pinterest.com>:
> >
> >  Hi,
> >
> > We are planning to experiment with a cluster for serving production
> >
> >  traffic
> >
> >  using hbase for pinterest. We are starting off with a 10 region server +
> >
> >  1
> >
> >  master cluster on Amazon EMR version 0.92. I had some very naive
> >
> >  questions
> >
> >  (primarily around points of failure):
> >
> > 1) It seems hbase starts only one zookeeper on the master node - which is
> > critical for operation - how many zookeepers should I use and can I run
> > those on the region servers ?
> > 2) How many masters to use - does hbase support multiple masters (primary
> > and secondary) within the same cluster ? From my understanding, master
> > availability is not critical for operation.
> > 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a
> > single point of failure and we should really be running two name node(s)
> >
> >  so
> >
> >  we can failover. Is it fine to run these on the region servers ?
> > 4) Our current application involves long row/column - 24-32 bytes with
> >
> >  0-1
> >
> >  bytes of values. Should we be using a different key encoding than the
> > default encoding ? What advantages could it buy us ?
> >
> > We are currently using amazon EMR for testing purposes which runs hbase
> > 0.92. If it works well, we would like to configure our own cluster with
> > probably the latest version of hbase which appears to be 0.94 at the
> > moment.
> >
> > Thanks
> > Varun
> >
> >
> >   10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> > http://www.uci.cuhttp://
> www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci
> >
> >
> > --
> > **
> >
> > Marcos Luis Ortíz Valmaseda
> > about.me/marcosortiz
> > @marcosluis2186 <http://twitter.com/marcosluis2186>
> >  **
> >
> >   <http://www.uci.cu/>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message