hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Thrift Gateway Server, ZooKeeper & HBase
Date Mon, 01 Oct 2012 09:30:12 GMT


On Mon, Oct 1, 2012 at 2:45 PM, Pankaj Misra <pankaj.misra@impetus.co.in> wrote:
> Dear All,
> I would like to request your help for clearing some doubts that I have around the deployment
view of these components. I have been able to do some tests on my pseudo-distributed environment
and have been able to get very good throughput using Thrift client and gateway server. I need
your help to have a clear view of the deployment components, so that I can further elaborate
my environment with a clear thought process.
> Based on my recent experiences on gateway based connectivity using thrift to access hbase
regions, it occurs to me that in order to run a thrift server it has to be run on the hbase
node itself. I am trying to envision the deployment view in context of thrift gateway server
running on HBase node, ZooKeeper quorum and the HBase node themselves.

A thrift server needs connectivity to all HBase and ZK service/daemon
nodes, but does not need to be co-located with one.

> I am using a pseudo-distributed configuration of HBase 0.94.1 with Hadoop 0.23.1 natively
compiled and have installed the thrift library as per the installation instructions. I also
see that running gateway servers on HBase is a big plus for a highly multi-threaded environment
as it takes advantage of thread pooling. So since I am running my setup in a pseudo-distributed
mode, I have 1 node of HBase, 1 Zookeeper quorum, 1 region server, 1 NN, 1 DN and 1 SNN.
> So if I have to illustrate my thinking here, the steps that I perform to have HBase running
with thrift gateway server are
> $HBASE_HOME/bin/start-hbase.sh                                         --> Starts
the HBase node,  Zookeeper Quorum & Region Server
> $HBASE_HOME/bin/hbase.sh thrift start -threadpool  --> Starts the Thrift gateway server
on hbase node
> This makes me think that the thrift server is tightly coupled with every instance of
HBase node. If I just need to scale thrift server from a load balancing perspective, I cannot
do it independent of HBase scaling, I will have to add another HBase node in the cluster to
have another thrift server for scalability.

Do not couple library dependency with service dependency - both are
different things.

You may _install_ HBase libs on any machine connected to the cluster,
and start _just_ the thrift server on it. The HBase thrift server does
need HBase libraries to run, but does not need a local service to run

> Also with the above scenario in mind, what seems to me is that the thrift server which
runs on HBase, requests zookeeper for the connection and zookeeper allocates and manages the
connection lifecycle via native Java objects (HTable & HTablePool) objects for respective
RegionServers based on key values. Based on my understanding, which may be incorrect, if thrift
server has to run on HBase node, which would also be running region servers as well, why the
calls have to go through the zookeeper? Or is it that once the client makes a successful connection
with a thrift server (on an Hbase node),  which may be initially mediated by Zookeeper for
allocation, the client interaction happens directly with the thrift server?

If a thrift client is used, the client will only talk to thrift
server. The client will not talk to ZooKeeper. The thrift server will
talk to ZooKeeper, HMaster and HRegionServers like a regular Java
client instead, and act as a 'gateway' for requests to thrift clients.

Does this help clear your questions?

Harsh J

View raw message