hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: set number of map tasks for HBase MR
Date Sun, 11 Apr 2010 14:09:46 GMT
Yes an option could be added, along with a write buffer option for Import.

J-D

On Sun, Apr 11, 2010 at 3:30 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> I noticed mapreduce.Export.createSubmittableJob() doesn't call setCaching()
> in 0.20.3
>
> Should call to setCaching() be added ?
>
> Thanks
>
> On Sun, Apr 11, 2010 at 2:14 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:
>
>> A map against a HBase table by default cannot have more tasks than the
>> number of regions in that table.
>>
>> Also you want to enable scanner caching. Pass a Scan object to the
>> TableMapReduceUtil.initTableMapperJob that is configured with
>> scan.setCaching(some_value) where the value should be the number of
>> rows to fetch every time we hit a region server with next(). On rows
>> of 100-200 bytes, our jobs usually are configured with 1000 up to
>> 10000.
>>
>> Finally, is your job running in local mode or on a job tracker? Even
>> if HBase uses HDFS, it usually doesn't know of the job tracker unless
>> you configure HBase's classpath with Hadoop's conf.
>>
>> J-D
>>
>> On Sun, Apr 11, 2010 at 3:17 AM, Andriy Kolyadenko
>> <crypto5@mail.saturnfans.com> wrote:
>> > Hi,
>> >
>> > thanks for quick response. I tried to do following in the code:
>> >
>> > job.getConfiguration().setInt("mapred.map.tasks", 10000);
>> >
>> > but unfortunately have the same result.
>> >
>> > Any other ideas?
>> >
>> > --- amansk@gmail.com wrote:
>> >
>> > From: Amandeep Khurana <amansk@gmail.com>
>> > To: hbase-user@hadoop.apache.org, crypto5@mail.saturnfans.com
>> > Subject: Re: set number of map tasks for HBase MR
>> > Date: Sat, 10 Apr 2010 20:04:18 -0700
>> >
>> > You can set the number of map tasks in your job config to a big number
>> (eg:
>> > 100000), and the library will automatically spawn one map task per
>> region.
>> >
>> > -ak
>> >
>> >
>> > Amandeep Khurana
>> > Computer Science Graduate Student
>> > University of California, Santa Cruz
>> >
>> >
>> > On Sat, Apr 10, 2010 at 7:59 PM, Andriy Kolyadenko <
>> > crypto5@mail.saturnfans.com> wrote:
>> >
>> >> Hi guys,
>> >>
>> >> I have about 8G Hbase table  and I want to run MR job against it. It
>> works
>> >> extremely slow in my case. One thing I noticed is that job runs only 2
>> map
>> >> tasks. Is it any way to setup bigger number of map tasks? I sow some
>> method
>> >> in mapred package, but can't find anything like this in new mapreduce
>> >> package.
>> >>
>> >> I run my MR job one a single machine in cluster mode.
>> >>
>> >>
>> >> _____________________________________________________________
>> >> Sign up for your free SaturnFans email account at
>> >> http://webmail.saturnfans.com/
>> >>
>> >
>> >
>> >
>> >
>> > _____________________________________________________________
>> > Sign up for your free SaturnFans email account at
>> http://webmail.saturnfans.com/
>> >
>>
>

Mime
View raw message