hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: stargate performance evaluation
Date Tue, 04 Aug 2009 16:56:13 GMT

Because Stargate is itself a client of the HBase storage cluster, there
will be an extra round trip for each data transfer. There will always
be some performance penalty for this. Over time the penalty may become
quite small. If one is using some future version of Stargate to provide
Bigtable structured storage for a large enterprise, then we expect the
benefits will outweigh this. 

My personal goal for Stargate is to approximate an "internal S3" for
large enterprises.

There are many opportunities currently for performance tuning of
Stargate. For example:

  - Profile the code and look for (and re-engineer) bottlenecks in
    the resource methods.

  - Explore Jetty performance optimizations; current simple config for
    standalone mode may be naive.

  - Explore Jersey framework performance optimizations. The two 
    contributors who worked on Stargate are not (yet) Jersey wizards.

  - Intelligent batching of client requests to the storage cluster.

  - LRU caching for good read performance if the clients' collective
    working set can fit. 

Also, please be aware that the o.a.h.h.stargate.client package is at
this time a simple and naive wrapper around commons httpclient and 
could surely be improved. Its purpose now is to support the test

Best regards,

   - Andy

From: Haijun Cao <haijuncao@ymail.com>
To: hbase-user@hadoop.apache.org
Cc: apurtell@apache.org
Sent: Monday, August 3, 2009 10:36:42 PM
Subject: Re: stargate performance evaluation


Thanks for the reply. I am considering using stargate in one of my projects, the design/impl
is quite elegant. In your opinion, is there any hard limitation preventing stargate achieving
the same throughput as that of hbase java client?  Is it just a matter of fine tuning? I am
not sure if caching help in case of random read. I agree that the all local setup is naive,
will do a more realistic test and share the observation.  


From: Andrew Purtell <apurtell@apache.org>
To: hbase-user@hadoop.apache.org
Sent: Monday, August 3, 2009 5:25:09 PM
Subject: Re: stargate performance evaluation


Thanks for the testing and performance report!

You said you used the stargate Client package? It is pretty basic, written mainly for convenience
for writing test cases in the test suite. 

Regarding Stargate quality in general, this is an alpha release. It can survive torture testing
with PE it seems. It can handle well formed requests. But, the implementation is untuned.
For example, there is no caching (yet). The code has not yet been profiled also. 

I put up an issue for Stargate performance improvement: https://issues.apache.org/jira/browse/HBASE-1741

I'm not sure an all-localhost configuration is the best testing scenario. It would be interesting
to see how the performance differs with the client remote from both the regionservers and
the Stargate instance. 

  - Andy

From: Haijun Cao <haijuncao@ymail.com>
To: hbase-user@hadoop.apache.org
Sent: Monday, August 3, 2009 2:04:16 PM
Subject: stargate performance evaluation

I am evaluating the performance of stargate (which btw, is a
great contrib to hbase, thanks!). The evaluation program is mostly a simple
modification to the existing PerformanceEvaluation program, just replace java
client with stargate client and get value as protobuf. 

All of the software (hadoop, zookeeper, hbase, jetty) are
installed on one box. The data set is small, therefore all data are served out
of memory.

For random read test, with java client (the existing PE
program), I can get 19K/s, with stargate client,  I can only get 3-4k/s.
In both case, pe program run with 100 threads. Increasing number of threads
does not seem to help (even hurt the throughput).

I am just wondering if this is expected ( I can’t figure out
in theory why the throughput drop)? Any idea of possible optimization/configuration change
to increase the throughput?


Haijun Cao

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message