nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-993) NullPointerException at FetcherOutputFormat.checkOutputSpecs
Date Mon, 04 Jul 2011 10:57:22 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059387#comment-13059387
] 

Markus Jelsma commented on NUTCH-993:
-------------------------------------

There's an issue with ParseOutputformat. It fails when running Nutch locally:

{code}
ParseSegment: segment: crawl/segments/20110704125233
Exception in thread "main" java.io.IOException: Segment already fetched!
        at org.apache.nutch.parse.ParseOutputFormat.checkOutputSpecs(ParseOutputFormat.java:86)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:772)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
        at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:157)
        at org.apache.nutch.parse.ParseSegment.run(ParseSegment.java:178)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:164)

{code}

> NullPointerException at FetcherOutputFormat.checkOutputSpecs
> ------------------------------------------------------------
>
>                 Key: NUTCH-993
>                 URL: https://issues.apache.org/jira/browse/NUTCH-993
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.3
>         Environment: Cloudera CDH3 Cluster (hadoop 0.20.2-cdh3u0)
>            Reporter: Christian Guegi
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.4, 2.0
>
>         Attachments: FetcherOutputFormat.patch, ParseOutputFormat.patch
>
>
> When running Nutch as a mapreduce job on an existing cluster I get an NullPointerException
at org.apache.nutch.fetcher.FetcherOutputFormat.checkOutputSpecs.
> The reason is that the passed in reference to the file system is null.
> The attached patch ignores the parameter 'fs' and creates a new reference to the file
system.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message