hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jochen Frey <jochen_f...@yahoo.com>
Subject Fast retrieval of multiple rows with non-sequential keys
Date Mon, 05 Oct 2009 10:02:00 GMT
I want to use HBase as a BLOB store for a search engine application.  
For that the objects will be stored in one HBase table (~ 1B rows).  
Object size is typically between 1kB to 20kB.

I am concerned about my read pattern, where our typical read retrieve  
between tens and thousands of rows in random order. Looking at the  
Java API the only method to retrieve rows in random order is to issue  
multiple

Result = HTable.get(Get)

requests sequentially (I assume a Scanner is not a good idea since the  
rows are need are spread randomly across the table / regions / etc.).

My concern is that with that pattern I have one rpc call per item,  
which seems to be a lot of overhead, especially when I need to  
retrieve 100s or 1,000s of rows.

Would it not be preferable to batch up requests so that all rows  
requested would be grouped by region, and then send off in parallel to  
regions for retrieval - that way there'd be fewer RPC calls, and they  
could be executed in parallel, as well? As such an addition to the  
interface could look something like

List<Result> = HTable.get(List<Get>)

Am I making sense? Is there something that I am missing?

Thanks!
Jochen



Mime
View raw message