nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "byron miller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-159) Specify temp/working directory for crawl
Date Mon, 02 Jan 2006 18:33:01 GMT
    [ http://issues.apache.org/jira/browse/NUTCH-159?page=comments#action_12361545 ] 

byron miller commented on NUTCH-159:
------------------------------------

While it's from the mapred trunk, it is a non ndfs/local instance only.  Mapred.temp.dir was
left at it's defaults.. (which didn't exist)


<property>
  <name>mapred.temp.dir</name>
  <value>/tmp/nutch/mapred/temp</value>
  <description>A shared directory for temporary files.
  </description>
</property>

I'm going to modify this and re-run my fetch and let you know how that works.  


> Specify temp/working directory for crawl
> ----------------------------------------
>
>          Key: NUTCH-159
>          URL: http://issues.apache.org/jira/browse/NUTCH-159
>      Project: Nutch
>         Type: Bug
>   Components: fetcher, indexer
>     Versions: 0.8-dev
>  Environment: Linux/Debian
>     Reporter: byron miller

>
> I ran a crawl of 100k web pages and got:
> org.apache.nutch.fs.FSError: java.io.IOException: No space left on device
>         at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:149)
>         at org.apache.nutch.fs.FileUtil.copyContents(FileUtil.java:65)
>         at org.apache.nutch.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:178)
>         at org.apache.nutch.fs.NutchFileSystem.rename(NutchFileSystem.java:224)
>         at org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:80)
> Caused by: java.io.IOException: No space left on device
>         at java.io.FileOutputStream.writeBytes(Native Method)
>         at java.io.FileOutputStream.write(FileOutputStream.java:260)
>         at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(LocalFileSystem.java:147)
>         ... 4 more
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
>         at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:335)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:107)
> byron@db02:/data/nutch$ df -k
> It appears crawl created a /tmp/nutch directory that filled up even though i specified
a db directory.
> Need to add a parameter to the command line or make a globaly configurable /tmp (work
area) for the nutch instance so that crawls won't fail.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Mime
View raw message