hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Angeles <patr...@cloudera.com>
Subject Re: Hbase cluster for serving real time site traffic
Date Thu, 01 Nov 2012 19:11:07 GMT
On Thu, Nov 1, 2012 at 1:09 PM, Leonid Fedotov <lfedotov@hortonworks.com>wrote:

> Varun,
> for HA NameNode you may want to  look at Hortonworks HDP 1.1 release. It
> supported on vSphere and on RedHat HA cluster.
> HDP 1.1 based on Hadoop 1.0.3 and fully certified for production
> environments.
> Do not forget, Hadoop 2.0 is still in alpha testing stage and a can not be
> recommended for production systems.
>

HA Namenode is actually running in a number of HBase production systems.


> As of ZK nodes:
> depending on the amount of ZK traffic, you may not need to put it to the
> separate nodes, it could easily coexist with DN .
>

This is a very bad idea. You should never co-locate ZK on a worker node, as
it can starve of CPU or IOPs and time-out (thereby causing cascading
failures). This can happen, for example, when someone submits an MR job.


> However, it is better to split NN and HBmaster to separate nodes. Like NN
> on one node and HB Master and JT on other node.
>

Why? The HMaster exerts very little load on the host. If you have three
masters and want HA, you can have the following config:

Host 1: Primary NN, HMaster1, ZK1
Host 2: Standby NN, HMaster2, ZK2
Host 3: JT, HMaster3, ZK3


>
> Thank you!
>
> Sincerely,
> Leonid Fedotov
> Technical Support Engineer
> lfedotov@hortonworks.com
> office: +1 855 846 7866 ext 292
> mobile: +1 650 430 1673
>
> On Nov 1, 2012, at 4:17 AM, Marcos Ortiz Valmaseda wrote:
>
> > Regards, Varun.
> > 1- I think that you should take a look to the Cloudera Manager for CDH
> 4.1 to create a
> > HA HDFS enviroment. Remember that the version 2.0.x is not ready for
> production yet. The stable version is Hadoop 1.0.4 with HBase 0.94.2
> >
> > 2- Yes, a recommended practice is to have a separate Zookeeper ensemble
> (three, five or seven are good numbers for the ensemble) from your NN, HB
> Master. For example:
> > - 1 NN/HB Master, JT
> > - 5 DN, HR Servers, TT
> > - 3 nodes for the Zookeeper quorum.
> >
> > Best wishes.
> >
> > ----- Mensaje original -----
> > De: Varun Sharma <varun@pinterest.com>
> > Para: Marcos Ortiz <mlortiz@uci.cu>, kevin odell <
> kevin.odell@cloudera.com>
> > CC: user@hbase.apache.org
> > Enviado: Thu, 01 Nov 2012 03:01:55 -0500 (CST)
> > Asunto: Re: Hbase cluster for serving real time site traffic
> >
> > Thanks all for the helpful comments. I read up on HA and was wondering if
> > there are good tools for setting up a HA HDFS + Hbase cluster on EC2
> > quickly. From my reading, it appears that tools like Whirr still have
> > issues with bringing up the secondary NN on a different machine etc. Also
> > for availability, would Master-Slave replication or Master-Master
> > replication be a substitute for having the secondary NN.
> >
> > For zookeeper, should the servers be running ZK only or is it fine to
> share
> > with other services like the master ? Also, is it better to have a
> > dedicated zookeeper cluster per hbase cluster ?
> >
> > Thanks
> > Varun
> >
> > On Tue, Oct 30, 2012 at 1:20 PM, Marcos Ortiz <mlortiz@uci.cu> wrote:
> >
> >> Regards, Varun, answers in line
> >>
> >> On 10/30/2012 01:03 PM, Varun Sharma wrote:
> >>
> >> Thanks for the tips.
> >>
> >> So, yes, secondary NameNode is probably more critical than the secondary
> >> master - since the master is only responsible for metadata
> changes/region
> >> splits/table creation etc and not for writes/reads.
> >>
> >> Exactly, you have to create a good HA strategy for these nodes (Master
> >> and Secondary Master)
> >>
> >>
> >> Regarding the keys question - i meant that the (row + column) length is
> >> 24-32 bytes and the value length is 0-1 bytes. Currently, we have a
> cluster
> >> running with all the data loaded into hbase but it all runs with default
> >> settings.
> >>
> >> There are many areas that you can optimize in a HBase cluster:
> >> - Write operations
> >> - Compactions and Split optimization
> >> - Region Servers size
> >> - Snappy compression
> >> - Schema design
> >> - Use of Block caching to Scan optimization
> >> - Use of asynchronous clients for HBase operations (asynchbase for
> >> example[1])
> >> etc
> >>
> >> The excellent Lars's book: "HBase: The Definitive Guide" has a completed
> >> chapter for this tricky topic (Chapter 11)
> >>
> >> Some additional resources:
> >>
> >> [1] https://github.com/stumbleupon/asynchbase
> >> https://github.com/twitter/finagle
> >> http://gbif.blogspot.com/2012/02/performance-evaluation-of-hbase.html
> >> http://gbif.blogspot.com/2012/02/monitoring-hadoop-and-hbase.html
> >> http://www.cloudera.com/blog/2011/04/hbase-dos-and-donts/
> >>
> >> Look at Slidehare all tagged presentations from the last HBaseCon, for
> >> example the Benoit's talk about
> >> "Lessons learned from OpenTSDB" and Lars Hofhansl's "HBase Schema
> Design":
> >> http://www.slideshare.net/cloudera/tag/hbasecon-2012
> >>
> >> Best wishes
> >>
> >> Thanks
> >> Varun
> >>
> >> On Tue, Oct 30, 2012 at 10:53 AM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
> >>
> >>
> >> My 2¢.
> >>
> >> 1) You need an odd number of ZooKeeper nodes. So 3 is the minimum
> >> recommanded for production.
> >> 2) Yes, you have Master and SecondaryMaster. And it's also recommanded
> >> to have one of each. And the master is critical. If you are loosing
> >> it, you are loosing your cluster.
> >> 3) NameNode is hadoop, not hbase. You should follow hadoop
> >> recommandations. Like you have secondarymaster, you have
> >> secondarynamenode. So I think you should have as many
> >> secondarynamenode as you have secondarymaster (on the same machine?).
> >> 4) I'm not sure to understanding this question. Key are binary. Array
> >> of bytes. So 32 0-1 bytes is a 3 bytes long array. It's not a lot.
> >> This will only give you 2^32 different rows. You will have to
> >> pre-split them, or you will end with almost all of them on the same
> >> regionserver?
> >>
> >> JM
> >>
> >> 2012/10/30, Varun Sharma <varun@pinterest.com> <varun@pinterest.com>:
> >>
> >> Hi,
> >>
> >> We are planning to experiment with a cluster for serving production
> >>
> >> traffic
> >>
> >> using hbase for pinterest. We are starting off with a 10 region server +
> >>
> >> 1
> >>
> >> master cluster on Amazon EMR version 0.92. I had some very naive
> >>
> >> questions
> >>
> >> (primarily around points of failure):
> >>
> >> 1) It seems hbase starts only one zookeeper on the master node - which
> is
> >> critical for operation - how many zookeepers should I use and can I run
> >> those on the region servers ?
> >> 2) How many masters to use - does hbase support multiple masters
> (primary
> >> and secondary) within the same cluster ? From my understanding, master
> >> availability is not critical for operation.
> >> 3) NameNode - We are running hadoop 0.8 - I have read that NameNode is a
> >> single point of failure and we should really be running two name node(s)
> >>
> >> so
> >>
> >> we can failover. Is it fine to run these on the region servers ?
> >> 4) Our current application involves long row/column - 24-32 bytes with
> >>
> >> 0-1
> >>
> >> bytes of values. Should we be using a different key encoding than the
> >> default encoding ? What advantages could it buy us ?
> >>
> >> We are currently using amazon EMR for testing purposes which runs hbase
> >> 0.92. If it works well, we would like to configure our own cluster with
> >> probably the latest version of hbase which appears to be 0.94 at the
> >> moment.
> >>
> >> Thanks
> >> Varun
> >>
> >>
> >>  10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> >> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >> http://www.uci.cuhttp://
> www.facebook.com/universidad.ucihttp://www.flickr.com/photos/universidad_uci
> >>
> >>
> >> --
> >> **
> >>
> >> Marcos Luis Ortíz Valmaseda
> >> about.me/marcosortiz
> >> @marcosluis2186 <http://twitter.com/marcosluis2186>
> >> **
> >>
> >>  <http://www.uci.cu/>
> >>
> >>
> >
> >
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
> >
> >
> >
> > 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> > CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> >
> > http://www.uci.cu
> > http://www.facebook.com/universidad.uci
> > http://www.flickr.com/photos/universidad_uci
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message