hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanshu Vashishtha <hvash...@cs.ualberta.ca>
Subject Re: speeding up rowcount
Date Mon, 10 Oct 2011 01:05:27 GMT
MapReduce support in HBase inherently provides parallelism such that
each Region is given to one mapper.

Himanshu

On Sun, Oct 9, 2011 at 6:44 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
> Be aware that the contract for a scan is to return all rows sorted by rowkey, hence it
cannot scan regions in parallel by default.I have not played much HBase with MapReduce, but
if order is not important you can to split the scan into multiple scans.
>
>
> ----- Original Message -----
> From: Tom Goren <tom@tomgoren.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Sunday, October 9, 2011 8:07 AM
> Subject: Re: speeding up rowcount
>
> lol - i just ran a rowcount via mapreduce, and it took 6 hours for 7.5
> million rows...
>
> On Sun, Oct 9, 2011 at 7:50 AM, Rita <rmorgan466@gmail.com> wrote:
>
>> Hi,
>>
>> I have been doing a rowcount via mapreduce and its taking about 4-5 hours
>> to
>> count a 500million rows in a table. I was wondering if there are any map
>> reduce tunings I can do so it will go much faster.
>>
>> I have 10 node cluster, each node with 8CPUs with 64GB of memory. Any
>> tuning
>> advice would be much appreciated.
>>
>>
>> --
>> --- Get your facts first, then you can distort them as you please.--
>>
>
>

Mime
View raw message