nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cosmin Lehene (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19
Date Fri, 03 Apr 2009 09:01:12 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695300#action_12695300
] 

Cosmin Lehene commented on NUTCH-692:
-------------------------------------

The AlreadyBeingCreatedException appears at the second attempt for a reduce task. In our case,
in the reduce phase of the fetch process sometimes we had hanging processes that initially
reported something like "Task attempt_200903031109_0007_r_000002_0 failed to report status
for 603 seconds. Killing!". When the task is retried the ParseOutputFormat, FetcherOutputFormat
try to create the MapFile, but this already exists so the task fails again with AlreadyBeingCreatedException
and there's no way to recover unless the files are deleted. 

The patch fixes the issue with a second reduce attempt, and yes, it works. Without the patch
there will be no second reduce attempt since it will stop at 
new MapFile.Writer(job,...) with the AlreadyBeingCreatedException

It's not trivial to reproduce the "failed to report status for X seconds. Killing!" problem,
unless you have some bad regexp to feed the crawler with :). However I believe it could be
reproduced by stopping and starting the tasktracker with hadoop-daemon.sh stop/start tasktracker.


Another way to reproduce just the HDFS exception is to try to create the same file twice.

 
 However it should be known that there are many reasons for "failed to report status for 603
seconds. Killing!". One of them is due to the regex problem stated above when the regex.match
process loops forever taking 100% of the CPU. If for some reason the reduce task will hit
a problem like this this patch won't help much, except that it will let the reduce phase go
through the entire reducer process again and not fail when it starts. 

    

> AlreadyBeingCreatedException with Hadoop 0.19
> ---------------------------------------------
>
>                 Key: NUTCH-692
>                 URL: https://issues.apache.org/jira/browse/NUTCH-692
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>         Attachments: NUTCH-692.patch
>
>
> I have been using the SVN version of Nutch on an EC2 cluster and got some AlreadyBeingCreatedException
during the reduce phase of a parse. For some reason one of my tasks crashed and then I ran
into this AlreadyBeingCreatedException when other nodes tried to pick it up.
> There was recently a discussion on the Hadoop user list on similar issues with Hadoop
0.19 (see http://markmail.org/search/after+upgrade+to+0%2E19%2E0). I have not tried using
0.18.2 yet but will do if the problems persist with 0.19
> I was wondering whether anyone else had experienced the same problem. Do you think 0.19
is stable enough to use it for Nutch 1.0?
> I will be running a crawl on a super large cluster in the next couple of weeks and I
will confirm this issue  
> J.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message