nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Susam Pal" <susam....@gmail.com>
Subject Re: nutch latest build - inject operation failing
Date Thu, 14 Feb 2008 14:43:49 GMT
I can confirm this error as I just tried running the last revision of
Nutch, rev-620818 on Debian as well as Cygwin on Windows.

It works fine on Debian but fails on Cygwin with this error:-

2008-02-14 19:49:47,756 WARN  regex.RegexURLNormalizer - can\'t find
rules for scope \'inject\', using default
2008-02-14 19:49:48,381 WARN  mapred.LocalJobRunner - job_local_1
java.io.IOException: Target
file:/D:/tmp/hadoop-guest/mapred/temp/inject-temp-322737506/_reduce_bjm6rw/part-00000
already exists
	at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
	at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
	at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
	at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
	at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
	at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
2008-02-14 19:49:49,225 FATAL crawl.Injector - Injector:
java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
	at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
	at org.apache.nutch.crawl.Injector.run(Injector.java:192)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
	at org.apache.nutch.crawl.Injector.main(Injector.java:182)

Indeed the \'inject-temp-322737506\' is present in the specified
folder of D drive and doesn\'t get deleted.

Is this because multiple map/reduce is running and one of them is
finding the directory to be present and therefore fails?

So, I also tried setting this in \'conf/hadoop-site.xml\':-

<property>
<name>mapred.speculative.execution</name>
<value>false</value>
<description></description>
</property>

I wonder why the same issue doesn\'t occur in Linux. I am not well
acquainted with the Hadoop code yet. Could someone throw light on what
might be going wrong?

Regards,
Susam Pal

On 2/7/08, DS jha <aedsjha@gmail.com> wrote:
Hi -
>
> Looks like latest trunk version of nutch is failing with the following
> exception when trying to perform inject operation:
>
> java.io.IOException: Target
> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
> already exists
>         at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>         at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>         at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>         at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>
> Any thoughts?
>
> Thanks
> Jha
>

Mime
View raw message