nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Joyce (JIRA)" <>
Subject [jira] [Commented] (NUTCH-2049) Upgrade Trunk to Hadoop > 2.4 stable
Date Wed, 12 Aug 2015 21:31:45 GMT


Michael Joyce commented on NUTCH-2049:

Hey [~lewismc],

Tried your patch here. Seems I have to add the following to the ivy.xml file to get this to
work at all

<dependency org="org.apache.hadoop" name="hadoop-mapreduce-client-jobclient" rev="2.4.0"

Otherwise, I end up getting the following when I try to run a test crawl

Injector: starting at 2015-08-12 15:04:42
Injector: crawlDb: crawl/crawldb
Injector: urlDir: ../../urls_test
Injector: Converting injected urls to crawl db entries.
Injector: Cannot initialize Cluster. Please check your configuration
for and the correspond server addresses.
    at org.apache.hadoop.mapreduce.Cluster.initialize(
    at org.apache.hadoop.mapreduce.Cluster.<init>(
    at org.apache.hadoop.mapreduce.Cluster.<init>(
    at org.apache.hadoop.mapred.JobClient.init(
    at org.apache.hadoop.mapred.JobClient.<init>(
    at org.apache.hadoop.mapred.JobClient.runJob(
    at org.apache.nutch.crawl.Injector.inject(
    at org.apache.nutch.crawl.Injector.main(

However, after addressing that concern I end up runnign into the following on the test crawl

java.lang.Exception: java.lang.ClassCastException:$Writer$KeyClassOption
cannot be cast to$Writer$Option
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(
	at org.apache.hadoop.mapred.LocalJobRunner$
Caused by: java.lang.ClassCastException:$Writer$KeyClassOption
cannot be cast to$Writer$Option
	at org.apache.nutch.fetcher.FetcherOutputFormat.getRecordWriter(
	at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(
	at org.apache.hadoop.mapred.LocalJobRunner$Job$
	at java.util.concurrent.Executors$
	at java.util.concurrent.ThreadPoolExecutor.runWorker(
	at java.util.concurrent.ThreadPoolExecutor$
2015-08-12 14:24:39,906 ERROR fetcher.Fetcher - Fetcher: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(
	at org.apache.nutch.fetcher.Fetcher.fetch(
	at org.apache.nutch.fetcher.Fetcher.main(

> Upgrade Trunk to Hadoop > 2.4 stable
> ------------------------------------
>                 Key: NUTCH-2049
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: build
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 1.11
>         Attachments: NUTCH-2049.patch
> Convo here -
> I am +1 for taking trunk (or a branch of trunk) to explicit dependency on > Hadoop
> We can run our tests, we can validate, we can fix.
> I will be doing validation on 2.X in paralegal as this is what I use on my own projects.

This message was sent by Atlassian JIRA

View raw message