nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From esmithers <eugene.smith...@gmail.com>
Subject Re: nutch latest build - inject operation failing
Date Wed, 27 Feb 2008 23:46:52 GMT

Any resolution to this problem? I just tried installing on Windows and I'm
getting the same problem.


Susam Pal wrote:
> 
> I tried setting hadoop.tmp.dir to /cygdrive/d/tmp and it created
> D:\cygdrive\d\tmp\mapred\temp\inject-temp-1365510909\_reduce_n7v9vq.
> 
> The same error occurred:-
> 
> 2008-02-15 10:19:22,833 WARN  mapred.LocalJobRunner - job_local_1
> java.io.IOException: Target
> file:/D:/cygdrive/d/tmp/mapred/temp/inject-temp-1365
> 510909/_reduce_n7v9vq/part-00000 already exists
>        at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>        at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>        at
> org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:180)
>        at
> org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>        at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>        at org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>        at org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>        at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
> 
> Regards,
> Susam Pal
> 
> On Thu, Feb 14, 2008 at 10:07 PM, Susam Pal <susam.pal@gmail.com> wrote:
>> What I did try was setting hadoop.tmp.dir to /opt/tmp. I found the
>>  behavior strange. I had an /opt/tmp directory in my Cygwin
>>  installation (Absolute Windows path: D:\Cygwin\opt\tmp) and I was
>>  expecting Hadoop to use it. However, it created a new D:\opt\tmp and
>>  wrote the temp files there. Of course this failed with the same error.
>>
>>  Right now I don't have a Windows system with me. I will try setting it
>>  as /cygdrive/d/tmp/ tomorrow when I again have access to a Windows
>>  system and then I'll update the mailing list with the observations.
>>  Thanks for the suggestion.
>>
>>  Regards,
>>  Susam Pal
>>
>>
>>
>>  On Thu, Feb 14, 2008 at 9:41 PM, Dennis Kubes <kubes@apache.org> wrote:
>>  > I think what might be occurring is a file path issue with hadoop.  I
>>  >  have seen it in the past.  Can you try on windows using the cygdrive
>>  >  path and see if that works?  For below it would be /cygdrive/D/tmp/
>> ...
>>  >
>>  >  Dennis
>>  >
>>  >
>>  >
>>  >  Susam Pal wrote:
>>  >  > I can confirm this error as I just tried running the last revision
>> of
>>  >  > Nutch, rev-620818 on Debian as well as Cygwin on Windows.
>>  >  >
>>  >  > It works fine on Debian but fails on Cygwin with this error:-
>>  >  >
>>  >  > 2008-02-14 19:49:47,756 WARN  regex.RegexURLNormalizer - can\'t
>> find
>>  >  > rules for scope \'inject\', using default
>>  >  > 2008-02-14 19:49:48,381 WARN  mapred.LocalJobRunner - job_local_1
>>  >  > java.io.IOException: Target
>>  >  >
>> file:/D:/tmp/hadoop-guest/mapred/temp/inject-temp-322737506/_reduce_bjm6rw/part-00000
>>  >  > already exists
>>  >  >       at org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>>  >  >       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>>  >  >       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>>  >  >       at
>> org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>>  >  >       at
>> org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>>  >  >       at
>> org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>>  >  >       at
>> org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>>  >  >       at
>> org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>>  >  >       at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>>  >  > 2008-02-14 19:49:49,225 FATAL crawl.Injector - Injector:
>>  >  > java.io.IOException: Job failed!
>>  >  >       at
>> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:831)
>>  >  >       at org.apache.nutch.crawl.Injector.inject(Injector.java:162)
>>  >  >       at org.apache.nutch.crawl.Injector.run(Injector.java:192)
>>  >  >       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>  >  >       at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:54)
>>  >  >       at org.apache.nutch.crawl.Injector.main(Injector.java:182)
>>  >  >
>>  >  > Indeed the \'inject-temp-322737506\' is present in the specified
>>  >  > folder of D drive and doesn\'t get deleted.
>>  >  >
>>  >  > Is this because multiple map/reduce is running and one of them is
>>  >  > finding the directory to be present and therefore fails?
>>  >  >
>>  >  > So, I also tried setting this in \'conf/hadoop-site.xml\':-
>>  >  >
>>  >  > <property>
>>  >  > <name>mapred.speculative.execution</name>
>>  >  > <value>false</value>
>>  >  > <description></description>
>>  >  > </property>
>>  >  >
>>  >  > I wonder why the same issue doesn\'t occur in Linux. I am not well
>>  >  > acquainted with the Hadoop code yet. Could someone throw light on
>> what
>>  >  > might be going wrong?
>>  >  >
>>  >  > Regards,
>>  >  > Susam Pal
>>  >  >
>>  >  > On 2/7/08, DS jha <aedsjha@gmail.com> wrote:
>>  >  > Hi -
>>  >  >> Looks like latest trunk version of nutch is failing with the
>> following
>>  >  >> exception when trying to perform inject operation:
>>  >  >>
>>  >  >> java.io.IOException: Target
>>  >  >>
>> file:/tmp/hadoop-user/mapred/temp/inject-temp-1280136828/_reduce_dv90x0/part-00000
>>  >  >> already exists
>>  >  >>         at
>> org.apache.hadoop.fs.FileUtil.checkDest(FileUtil.java:246)
>>  >  >>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:125)
>>  >  >>         at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:116)
>>  >  >>         at
>> org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:196)
>>  >  >>         at
>> org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:394)
>>  >  >>         at
>> org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:452)
>>  >  >>         at
>> org.apache.hadoop.mapred.Task.moveTaskOutputs(Task.java:469)
>>  >  >>         at
>> org.apache.hadoop.mapred.Task.saveTaskOutput(Task.java:426)
>>  >  >>         at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:165)
>>  >  >>
>>  >  >> Any thoughts?
>>  >  >>
>>  >  >> Thanks
>>  >  >> Jha
>>  >  >>
>>  >
>>
> 
> 

-- 
View this message in context: http://www.nabble.com/nutch-latest-build---inject-operation-failing-tp15328068p15726097.html
Sent from the Nutch - Dev mailing list archive at Nabble.com.


Mime
View raw message