hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Taylor, Ronald C" <ronald.tay...@pnl.gov>
Subject RE: Got a problem using Hbase as a MR sink - program hangs in the reduce part
Date Fri, 02 Oct 2009 04:30:47 GMT
Hello St. Ack,

Answers to your questions:

1) yes, we are planning on switching to 0.20. Just haven't yet. So -
that might be the first thing to do.

2) re the # of reducers: at the start of my run fn, just after defining
a jobConf object, I do a
    jobConf.setNumReduceTasks(2)

Wasn't sure if that setting was per node or for the entire 10-node
cluster, so I also tried
    jobConf.setNumReduceTask(19)

Didn't make any difference - program still failed at 66%

3) yep, we have increased the number of file descriptors and xceivers as
recommended. I'll have to check into the patch you mentioned, though,
and see if we applied it. Of course, as you said, maybe we should
restart on 0.20.

4) re the debugging suggestions: noted, and I'll see what I can do. 

Thanks for the quick reply. I leave on a trip tomorrow morn, back next
Thursday, so - I'll be working on this as soon as I get back.
 Ron


-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
stack
Sent: Thursday, October 01, 2009 9:14 PM
To: hbase-user@hadoop.apache.org
Subject: Re: Got a problem using Hbase as a MR sink - program hangs in
the reduce part

Can you run 0.20.0?

66% is when it starts writing hbase.

How many reducers?

Enable DEBUG (see FAQ for how).

These are odd in that they are saying that the reduce task was dead --
no progress reported -- over ten whole minutes:

attempt_200908131056_0004_r_
>
> 000000_1 failed to report status for 603 seconds. Killing!



Can you find that task in the MapReduce UI and see what was going on?

You've read the 'Getting Started' where it talks about upping file
descriptors, xceivers, and applying the HDFS-127 patch to your hadoop
cluster?

Yours,
St.Ack



On Thu, Oct 1, 2009 at 5:24 PM, Taylor, Ronald C
<ronald.taylor@pnl.gov>wrote:

>
>  Hi folks,
>
> I am trying to run a simple MapReduce program that sums the number of 
> entries in a list in a column in an Hbase table and then places that 
> sum back into the table. Simple task, in theory - I am just trying out

> MapReduce programming combined with Hbase use, i.e., using an Hbase 
> table as a data source and as a sink for the output.
>
> So - I get the screen error output below. The program fails at 66% 
> into reduce. Don't know why - I have rerun it and it fails at the same
point.
> I am doing this on a 10-node Linux cluster using Hadoop 0.19.1 and 
> Hbase 0.19.3.
>
> I don't see any clues in the master Hbase and Hadoop logs. There are 
> no errors are reported that I can see - though I cheerfully admit to 
> being a complete novice at interpreting the log output.
>
> I'm hoping this is something simple - perhaps some parameter I forgot 
> to set? I am hoping the screen output below might provide guidance to 
> somebody with more experience. Could very much use some help.
>
>  - Ron Taylor
>
> ___________________________________________
> Ronald Taylor, Ph.D.
> Computational Biology & Bioinformatics Group Pacific Northwest 
> National Laboratory
> 902 Battelle Boulevard
> P.O. Box 999, MSIN K7-90
> Richland, WA  99352 USA
> Office:  509-372-6568
> Email: ronald.taylor@pnl.gov
> www.pnl.gov
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>
> Working in this directory:
>
> hadoop@neptune:/share/apps/RonWork/MR
>
> Command issued:
>
> /share/apps/hadoop/hadoop-0.19.1/bin/hadoop jar 
> jarredBinTableMRSummation.jar binTableMRSummation
>
> Screen output:
>
> 09/10/01 16:24:53 WARN mapred.JobClient: Use GenericOptionsParser for 
> parsing the arguments. Applications should implement Tool for the
same.
> 09/10/01 16:24:53 INFO mapred.TableInputFormatBase: split:
> 0->compute-0-0.local:,
> 09/10/01 16:24:54 INFO mapred.JobClient: Running job:
> job_200908131056_0004
> 09/10/01 16:24:55 INFO mapred.JobClient:  map 0% reduce 0%
> 09/10/01 16:25:27 INFO mapred.JobClient:  map 100% reduce 0%
> 09/10/01 16:25:38 INFO mapred.JobClient:  map 100% reduce 33%
> 09/10/01 16:25:43 INFO mapred.JobClient:  map 100% reduce 66%
> 09/10/01 16:35:40 INFO mapred.JobClient:  map 100% reduce 33%
> 09/10/01 16:35:41 INFO mapred.JobClient: Task Id :
> attempt_200908131056_0004_r_000000_0, Status : FAILED Task 
> attempt_200908131056_0004_r_000000_0 failed to report status for 603 
> seconds. Killing!
> 09/10/01 16:35:46 INFO mapred.JobClient:  map 100% reduce 0%
> 09/10/01 16:35:46 INFO mapred.JobClient: Task Id :
> attempt_200908131056_0004_r_000001_0, Status : FAILED Task 
> attempt_200908131056_0004_r_000001_0 failed to report status for 602 
> seconds. Killing!
> 09/10/01 16:35:51 INFO mapred.JobClient:  map 100% reduce 33%
> 09/10/01 16:35:56 INFO mapred.JobClient:  map 100% reduce 66%
> 09/10/01 16:45:55 INFO mapred.JobClient:  map 100% reduce 33%
> 09/10/01 16:45:55 INFO mapred.JobClient: Task Id :
> attempt_200908131056_0004_r_000000_1, Status : FAILED Task
> attempt_200908131056_0004_r_000000_1 failed to report status for 603 
> seconds. Killing!
> 09/10/01 16:45:55 INFO mapred.JobClient: Task Id :
> attempt_200908131056_0004_r_000000_2, Status : FAILED Task
> attempt_200908131056_0004_r_000000_2 failed to report status for 603 
> seconds. Killing!
> 09/10/01 16:46:00 INFO mapred.JobClient: Task Id :
> attempt_200908131056_0004_r_000001_1, Status : FAILED Task
> attempt_200908131056_0004_r_000001_1 failed to report status for 603 
> seconds. Killing!
> 09/10/01 16:46:06 INFO mapred.JobClient: Task Id :
> attempt_200908131056_0004_r_000001_2, Status : FAILED Task
> attempt_200908131056_0004_r_000001_2 failed to report status for 603 
> seconds. Killing!
>
> <manually killed via control-C at this point>
>
> 09/10/01 16:46:15 INFO mapred.JobClient:  map 100% reduce 66% Killed 
> by signal 2.
>
>
>

Mime
View raw message