hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: FileNotFoundException in Reduce step when running importtsv program
Date Thu, 17 Mar 2011 19:11:16 GMT
Grep the namenode logs for one of the files throwing
FileNotFoundException.  See if you can tell a story going by the grep
emissions.  Someone is moving the file on you is how it looks.  NN
logs might give you a clue.

St.Ack

On Thu, Mar 17, 2011 at 11:58 AM, Nichole Treadway <kntreadway@gmail.com> wrote:
> Hi all,
>
> I am attempting to bulk load data into HBase using the importtsv program. I
> have a very wide table (about 200 columns, 2 column families), and right now
> I'm trying to load in data from a single data file with 1 million rows.
>
> Importtsv works fine for this data when I am writing directly to the table.
> However, I would like the import to write to an output file, using the '*
> importtsv.bulk.output*' option. I have installed the HBase 1861 patch (
> https://issues.apache.org/jira/browse/HBASE-1861) to allow bulk upload with
> multi-column families.
>
> When I run the bulk upload program with the output file option on my data,
> it always fails in the reduce step. There are a large number of reduce tasks
> (2956) that get created. These tasks all get to about 35% completion and
> then fail with the following error:
>
>
> 2011-03-17 11:52:48,095 WARN org.apache.hadoop.mapred.TaskTracker: Error
>> running child
>> java.io.FileNotFoundException: File does not exist:
>> hdfs://master:9000/awardsData/_temporary/_attempt_201103151859_0066_r_000000_0
>>  at
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:468)
>> at
>> org.apache.hadoop.hbase.regionserver.StoreFile.getUniqueFile(StoreFile.java:580)
>>  at
>> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.writeMetaData(HFileOutputFormat.java:186)
>> at
>> org.apache.hadoop.hbase.mapreduce.HFileOutputFormat$1.close(HFileOutputFormat.java:247)
>>  at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:567)
>> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
>>  at org.apache.hadoop.mapred.Child.main(Child.java:170)
>> 2011-03-17 11:52:48,100 INFO org.apache.hadoop.mapred.TaskRunner: Runnning
>> cleanup for the task
>
>
>
> I've put the full output of the reduce task attempt here:
> http://pastebin.com/WMfqUwqC
>
> I've tried running the program on a small table (3 column families,
> inserting 3 values each for 1 million rows) and it works fine, though it
> only creates 1 reduce task for this.
>
> Any idea what the problem could be?
>
> FYI, my cluster has 4 nodes all acting as datanodes/regionservers, running
> on 64-bit Red Hat Linux. I'm running the hadoop-0.20-append branch, and for
> hbase, the latest revision of the 0.90.2 branch.
>
>
>
> Thanks for your help,
> Nichole
>

Mime
View raw message