hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Misra <pankaj.mi...@impetus.co.in>
Subject Thrift Gateway Server, ZooKeeper & HBase
Date Mon, 01 Oct 2012 09:15:57 GMT
Dear All,

I would like to request your help for clearing some doubts that I have around the deployment
view of these components. I have been able to do some tests on my pseudo-distributed environment
and have been able to get very good throughput using Thrift client and gateway server. I need
your help to have a clear view of the deployment components, so that I can further elaborate
my environment with a clear thought process.

Based on my recent experiences on gateway based connectivity using thrift to access hbase
regions, it occurs to me that in order to run a thrift server it has to be run on the hbase
node itself. I am trying to envision the deployment view in context of thrift gateway server
running on HBase node, ZooKeeper quorum and the HBase node themselves.

I am using a pseudo-distributed configuration of HBase 0.94.1 with Hadoop 0.23.1 natively
compiled and have installed the thrift library as per the installation instructions. I also
see that running gateway servers on HBase is a big plus for a highly multi-threaded environment
as it takes advantage of thread pooling. So since I am running my setup in a pseudo-distributed
mode, I have 1 node of HBase, 1 Zookeeper quorum, 1 region server, 1 NN, 1 DN and 1 SNN.

So if I have to illustrate my thinking here, the steps that I perform to have HBase running
with thrift gateway server are
$HBASE_HOME/bin/start-hbase.sh                                         --> Starts the HBase
node,  Zookeeper Quorum & Region Server
$HBASE_HOME/bin/hbase.sh thrift start -threadpool  --> Starts the Thrift gateway server
on hbase node

This makes me think that the thrift server is tightly coupled with every instance of HBase
node. If I just need to scale thrift server from a load balancing perspective, I cannot do
it independent of HBase scaling, I will have to add another HBase node in the cluster to have
another thrift server for scalability.

Also with the above scenario in mind, what seems to me is that the thrift server which runs
on HBase, requests zookeeper for the connection and zookeeper allocates and manages the connection
lifecycle via native Java objects (HTable & HTablePool) objects for respective RegionServers
based on key values. Based on my understanding, which may be incorrect, if thrift server has
to run on HBase node, which would also be running region servers as well, why the calls have
to go through the zookeeper? Or is it that once the client makes a successful connection with
a thrift server (on an Hbase node),  which may be initially mediated by Zookeeper for allocation,
the client interaction happens directly with the thrift server?

I would greatly appreciate your inputs to help me build correct understanding around the complete
deployment view, as I may have an incorrect perception around it.

Thanks and Regards
Pankaj Misra


Impetus Ranked in the Top 50 India's Best Companies to Work For 2012.

Impetus webcast 'Designing a Test Automation Framework for Multi-vendor Interoperable Systems'
available at http://lf1.me/0E/.

NOTE: This message may contain information that is confidential, proprietary, privileged or
otherwise protected by law. The message is intended solely for the named addressee. If received
in error, please destroy and notify the sender. Any use of this email is prohibited when received
in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this
communication has been maintained nor that the communication is free of errors, virus, interception
or interference.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message