nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dennis Kubes <ku...@apache.org>
Subject Re: nutch latest build - inject operation failing
Date Thu, 14 Feb 2008 16:11:13 GMT
I think what might be occurring is a file path issue with hadoop.  I 
have seen it in the past.  Can you try on windows using the cygdrive 
path and see if that works?  For below it would be /cygdrive/D/tmp/ ...

Dennis

Susam Pal wrote:
> I can confirm this error as I just tried running the last revision of
> Nutch, rev-620818 on Debian as well as Cygwin on Windows.
> 
> It works fine on Debian but fails on Cygwin with this error:-
> 
> 2008-02-14 19:49:47,756 WARN  regex.RegexURLNormalizer - can\'t find
> rules for scope \'inject\', using default
> 2008-02-14 19:49:48,381 WARN  mapred.LocalJobRunner - job_local_1
> java.io.IOException: Target
> file:/D:/tmp/hadoop-guest/mapred/temp/inject-temp-322737506/_reduce_bjm6rw/part-00000
> already exists
> 	at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
> 	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
> 	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
> 	at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
> 	at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
> 	at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
> 	at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
> 	at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
> 	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
> 2008-02-14 19:49:49,225 FATAL crawl.Injector - Injector:
> java.io.IOException: Job failed!
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
> 	at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
> 	at org.apache.nutch.crawl.Injector.run(Injector.java:192)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> 	at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
> 	at org.apache.nutch.crawl.Injector.main(Injector.java:182)
> 
> Indeed the \'inject-temp-322737506\' is present in the specified
> folder of D drive and doesn\'t get deleted.
> 
> Is this because multiple map/reduce is running and one of them is
> finding the directory to be present and therefore fails?
> 
> So, I also tried setting this in \'conf/hadoop-site.xml\':-
> 
> <property>
> <name>mapred.speculative.execution</name>
> <value>false</value>
> <description></description>
> </property>
> 
> I wonder why the same issue doesn\'t occur in Linux. I am not well
> acquainted with the Hadoop code yet. Could someone throw light on what
> might be going wrong?
> 
> Regards,
> Susam Pal
> 
> On 2/7/08, DS jha <aedsjha@gmail.com> wrote:
> Hi -
>> Looks like latest trunk version of nutch is failing with the following
>> exception when trying to perform inject operation:
>>
>> java.io.IOException: Target
>> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
>> already exists
>>         at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>>         at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>>         at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>>         at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>>         at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>>         at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>>
>> Any thoughts?
>>
>> Thanks
>> Jha
>>

Mime
View raw message