nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19
Date Thu, 02 Apr 2009 09:52:12 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12694942#action_12694942
] 

Julien Nioche commented on NUTCH-692:
-------------------------------------

As I pointed out in my previous message the root of the problem in my case was related to
some dodgy URLs coming from the Javascript parser which put the basic normalizer into a spin.
This would repeat in subsequent attempts indeed.

However the AlreadyBeingCreatedException should not happen and we should not have output files
left open. If you patch fixes that I am sure that this will be a very welcome contribution.

> AlreadyBeingCreatedException with Hadoop 0.19
> ---------------------------------------------
>
>                 Key: NUTCH-692
>                 URL: https://issues.apache.org/jira/browse/NUTCH-692
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Julien Nioche
>
> I have been using the SVN version of Nutch on an EC2 cluster and got some AlreadyBeingCreatedException
during the reduce phase of a parse. For some reason one of my tasks crashed and then I ran
into this AlreadyBeingCreatedException when other nodes tried to pick it up.
> There was recently a discussion on the Hadoop user list on similar issues with Hadoop
0.19 (see http://markmail.org/search/after+upgrade+to+0%2E19%2E0). I have not tried using
0.18.2 yet but will do if the problems persist with 0.19
> I was wondering whether anyone else had experienced the same problem. Do you think 0.19
is stable enough to use it for Nutch 1.0?
> I will be running a crawl on a super large cluster in the next couple of weeks and I
will confirm this issue  
> J.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message