hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andre Reiter <a.rei...@web.de>
Subject Re: full table scan
Date Tue, 07 Jun 2011 08:08:06 GMT
now i found out, that there are three regions, each on a particular region server (server2,
server3, server4)
the processing time is still >=60sec, which is not very impressive...

what can i do, to speed up the table scan

best regards

Andreas Reiter wrote:
> hello everybody
> i'm trying to scan my hbase table for reporting purposes
> the cluster has 4 servers:
> - server1: namenode, secondary namenode, jobtracker, hbase master, zookeeper1
> - server2: datanode, tasktracker, hbase regionserver, zookeeper2
> - server3: datanode, tasktracker, hbase regionserver, zookeeper3
> - server4: datanode, tasktracker, hbase regionserver
> everything seems to work properly
> versions:
> - hadoop-0.20.2-CDH3B4
> - hbase-0.90.1-CDH3B4
> - zookeeper-3.3.2-CDH3B4
> at the moment our hbase table has 300000 entries
> if i do a table scan over the hbase api (at the moment without a filter)
> ResultScanner scanner = table.getScanner(...);
> it takes about 60 seconds to process, which is actually okey, because all records are
processed be only one thread sequentially
> BUT it takes approximately the same time, if i do a scan over Map&Reduce job using
> i'm definitely doing something wrong, because the processing time is going up directly
proportional to the number of rows.
> in my understanding, the big advantage of hadoop/hbase is, that huge numbers of entries
can be processed in parallel and very fast
> 300k entries are not much, we expecting this number to be added hourly to our cluster,
but the processing time is increasing, which is actually not acceptable
> any one an idea, what i'm doing wrong?
> best regards
> andre

View raw message