hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: help on key design
Date Tue, 30 Jul 2013 22:45:17 GMT
Please also go over http://hbase.apache.org/book.html#perf.reading


On Tue, Jul 30, 2013 at 3:40 PM, Dhaval Shah <prince_mithibai@yahoo.co.in>wrote:

> If all your keys are grouped together, why don't you use a scan with
> start/end key specified? A sequential scan can theoretically be faster than
> MultiGet lookups (assuming your grouping is tight, you can also use filters
> with the scan to give better performance)
> How much memory do you have for your region servers? Have you enabled
> block caching? Is your CPU spiking on your region servers?
> If you are saturating the resources on your *hot* region server then yes
> having more region servers will help. If no, then something else is the
> bottleneck and you probably need to dig further
> Regards,
> Dhaval
> ________________________________
> From: Demian Berjman <dberjman@despegar.com>
> To: user@hbase.apache.org
> Sent: Tuesday, 30 July 2013 4:37 PM
> Subject: help on key design
> Hi,
> I would like to explain our use case of HBase, the row key design and the
> problems we are having so anyone can give us a help:
> The first thing we noticed is that our data set is too small compared to
> other cases we read in the list and forums. We have a table containing 20
> million keys splitted automatically by HBase in 4 regions and balanced in 3
> region servers. We have designed our key to keep together the set of keys
> requested by our app. That is, when we request a set of keys we expect them
> to be grouped together to improve data locality and block cache efficiency.
> The second thing we noticed, compared to other cases, is that we retrieve a
> bunch keys per request (500 aprox). Thus, during our peaks (3k requests per
> minute), we have a lot of requests going to a particular region servers and
> asking a lot of keys. That results in poor response times (in the order of
> seconds). Currently we are using multi gets.
> We think an improvement would be to spread the keys (introducing a
> randomized component on it) in more region servers, so each rs will have to
> handle less keys and probably less requests. Doing that way the multi gets
> will be spread over the region servers.
> Our questions:
> 1. Is it correct this design of asking so many keys on each request? (if
> you need high performance)
> 2. What about splitting in more region servers? It's a good idea? How we
> could accomplish this? We thought in apply some hashing...
> Thanks in advance!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message