hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anoop John <anoop.hb...@gmail.com>
Subject Re: HBase Client Performance Bottleneck in a Single Virtual Machine
Date Mon, 04 Nov 2013 06:06:38 GMT
He uses HConnection.getTable()   which in turn uses the Htable constructor


HTable(*final* *byte*[] tableName, *final* HConnection connection,
*final*ExecutorService pool)

So no worry. on HTable#close() the connection wont get closed   :)



-Anoop-


On Mon, Nov 4, 2013 at 11:29 AM, Sriram Ramachandrasekaran <
sri.rams85@gmail.com> wrote:

> HTable is the implementation of HTableInterface. I was looking at the code
> and it *indeed* closes the underlying resources on close() unless you
> create it with the ExecutorService and HConnection Option that lars
> suggested. Please do take a look at HTable constructors - that might help.
>
> P.S: I verified this on 0.94.6 code base. Hope things haven't changed b/w
> this and your version (0.94.12)
>
>
>
> On Mon, Nov 4, 2013 at 11:21 AM, <Michael.Grundvig@high5games.com> wrote:
>
> > Our current usage is how I would do this in a typical database app with
> > table acting like a statement. It looks like this:
> >
> > Connection connection = null;
> > HTableInterface table = null;
> > try {
> >         connection = pool.acquire();
> >         table = connection.getTable(tableName);
> >         // Do work
> > } finally {
> >         table.close();
> >         pool.release(connection);
> > }
> >
> > Is this incorrect? The API docs says close " Releases any resources held
> > or pending changes in internal buffers." I didn't interpret that as
> having
> > it close the underlying connection. Thanks!
> >
> > -Mike
> >
> > -----Original Message-----
> > From: Sriram Ramachandrasekaran [mailto:sri.rams85@gmail.com]
> > Sent: Sunday, November 03, 2013 11:43 PM
> > To: user@hbase.apache.org
> > Cc: larsh@apache.org
> > Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> > Machine
> >
> > Hey Michael - Per API documentation, closing the HTable Instance would
> > close the underlying resources too. Hope you are aware of it.
> >
> >
> > On Mon, Nov 4, 2013 at 11:06 AM, <Michael.Grundvig@high5games.com>
> wrote:
> >
> > > Hi Lars, at application startup the pool is created with X number of
> > > connections using the first method you indicated:
> > > HConnectionManager.createConnection(conf). We store each connection in
> > > the pool automatically and serve it up to threads as they request it.
> > > When a thread is done using the connection, they return it back to the
> > > pool. The connections are not be created and closed per thread, but
> > > only once for the entire application. We are using the
> > > GenericObjectPool from Apache Commons Pooling as the foundation of our
> > > connection pooling approach. Our entire pool implementation really
> > > consists of just a couple overridden methods to specify how to create
> > > a new connection and close it. The GenericObjectPool class does all the
> > rest. See here for details:
> > > http://commons.apache.org/proper/commons-pool/
> > >
> > > Each thread is getting a HTableInstance as needed and then closing it
> > > when done. The only thing we are not doing is using the
> > > createConnection method that takes in an ExecutorService as that
> > > wouldn't work in our model. Our app is like a web application - the
> > > thread pool is managed outside the scope of our application code so we
> > > can't assume the service is available at connection creation time.
> > Thanks!
> > >
> > > -Mike
> > >
> > >
> > > -----Original Message-----
> > > From: lars hofhansl [mailto:larsh@apache.org]
> > > Sent: Sunday, November 03, 2013 11:27 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: HBase Client Performance Bottleneck in a Single Virtual
> > > Machine
> > >
> > > Hi Micheal,
> > >
> > > can you try to create a single HConnection in your client:
> > > HConnectionManager.createConnection(Configuration conf) or
> > > HConnectionManager.createConnection(Configuration conf,
> > > ExecutorService
> > > pool)
> > >
> > > Then use HConnection.getTable(...) each time you need to do an
> operation.
> > >
> > > I.e.
> > > Configuration conf = ...;
> > > ExecutorService pool = ...;
> > > // create a single HConnection for you vm.
> > > HConnection con = HConnectionManager.createConnection(Configuration
> > > conf, ExecutorService pool); // reuse the connection for many tables,
> > > even in different threads HTableInterface table = con.getTable(...);
> > > // use table even for only a few operation.
> > > table.close();
> > > ...
> > > HTableInterface table = con.getTable(...); // use table even for only
> > > a few operation.
> > > table.close();
> > > ...
> > > // at the end close the connection
> > > con.close();
> > >
> > > -- Lars
> > >
> > >
> > >
> > > ________________________________
> > >  From: "Michael.Grundvig@high5games.com"
> > > <Michael.Grundvig@high5games.com>
> > > To: user@hbase.apache.org
> > > Sent: Sunday, November 3, 2013 7:46 PM
> > > Subject: HBase Client Performance Bottleneck in a Single Virtual
> > > Machine
> > >
> > >
> > > Hi all; I posted this as a question on StackOverflow as well but
> > > realized I should have gone straight ot the horses-mouth with my
> > > question. Sorry for the double post!
> > >
> > > We are running a series of HBase tests to see if we can migrate one of
> > > our existing datasets from a RDBMS to HBase. We are running 15 nodes
> > > with 5 zookeepers and HBase 0.94.12 for this test.
> > >
> > > We have a single table with three column families and a key that is
> > > distributing very well across the cluster. All of our queries are
> > > running a direct look-up; no searching or scanning. Since the
> > > HTablePool is now frowned upon, we are using the Apache commons pool
> > > and a simple connection factory to create a pool of connections and
> > > use them in our threads. Each thread creates an HTableInstance as
> > > needed and closes it when done. There are no leaks we can identify.
> > >
> > > If we run a single thread and just do lots of random calls
> > > sequentially, the performance is quite good. Everything works great
> > > until we start trying to scale the performance. As we add more threads
> > > and try and get more work done in a single VM, we start seeing
> > > performance degrade quickly. The client code is simply attempting to
> > > run either one of several gets or a single put at a given frequency.
> > > It then waits until the next time to run, we use this to simulate the
> > > workload from external clients. With a single thread, we will see call
> > times in the 2-3 milliseconds which is acceptable.
> > >
> > > As we add more threads, this call time starts increasing quickly. What
> > > gets strange is if we add more VMs, the times hold steady across them
> > > all so clearly it's a bottleneck in the running instance and not the
> > cluster.
> > > We can get a huge amount of processing happening across the cluster
> > > very easily - it just has to use a lot of VMs on the client side to do
> > > it. We know the contention isn't in the connection pool as we see the
> > > problem even when we have more connections than threads.
> > > Unfortunately, the times are spiraling out of control very quickly. We
> > > need it to support at least 128 threads in practice, but most
> > > important I want to support 500 updates/sec and 250 gets/sec. In
> > > theory, this should be a piece of cake for the cluster as we can do
> > > FAR more work than that with a few VMs, but we don't even get close to
> > this with a single VM.
> > >
> > > So my question: how do people building high-performance apps with
> > > HBase get around this? What approach are others using for connection
> > > pooling in a multi-threaded environment? There seems to be a
> > > surprisingly little amount of info about this on the web considering
> > > the popularity. Is there some client setting we need to use that makes
> > > it perform better in a threaded environment? We are going to try to
> > > cache HTable instances next but that's a total guess. There are
> > > solutions to offloading work to other VMs but we really want to avoid
> > > this as clearly the cluster can handle the load and it will
> dramatically
> > decrease the application performance in critical areas.
> > >
> > > Any help is greatly appreciated! Thanks!
> > > -Mike
> > >
> >
> >
> >
> > --
> > It's just about how deep your longing is!
> >
>
>
>
> --
> It's just about how deep your longing is!
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message