nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Error with Hadoop-0.4.0
Date Mon, 10 Jul 2006 08:11:42 GMT
Jérôme Charron wrote:
> In my environment, the crawl command terminate with the following error:
> 2006-07-06 17:41:49,735 ERROR mapred.JobClient 
> (
> - Input directory /localpath/crawl/crawldb/current in local is invalid.
> Exception in thread "main" Input directory
> /localpathcrawl/crawldb/current in local is invalid.
>        at org.apache.hadoop.mapred.JobClient.submitJob(
>        at org.apache.hadoop.mapred.JobClient.runJob(
>        at org.apache.nutch.crawl.Injector.inject(
>        at org.apache.nutch.crawl.Crawl.main(

Hadoop 0.4.0 by default requires all input directories to exist, where 
previous releases did not.  So we need to either create an empty 
"current" directory or change the InputFormat used in 
CrawlDb.createJob() to be one that overrides 
InputFormat.areValidInputDirectories().  The former is probably easier. 
  I've attached a patch.  Does this fix things for folks?


View raw message