nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "byron miller (JIRA)" <>
Subject [jira] Commented: (NUTCH-159) Specify temp/working directory for crawl
Date Mon, 02 Jan 2006 18:33:01 GMT
    [ ] 

byron miller commented on NUTCH-159:

While it's from the mapred trunk, it is a non ndfs/local instance only.  Mapred.temp.dir was
left at it's defaults.. (which didn't exist)

  <description>A shared directory for temporary files.

I'm going to modify this and re-run my fetch and let you know how that works.  

> Specify temp/working directory for crawl
> ----------------------------------------
>          Key: NUTCH-159
>          URL:
>      Project: Nutch
>         Type: Bug
>   Components: fetcher, indexer
>     Versions: 0.8-dev
>  Environment: Linux/Debian
>     Reporter: byron miller

> I ran a crawl of 100k web pages and got:
> org.apache.nutch.fs.FSError: No space left on device
>         at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(
>         at org.apache.nutch.fs.FileUtil.copyContents(
>         at org.apache.nutch.fs.LocalFileSystem.renameRaw(
>         at org.apache.nutch.fs.NutchFileSystem.rename(
>         at org.apache.nutch.mapred.LocalJobRunner$
> Caused by: No space left on device
>         at Method)
>         at
>         at org.apache.nutch.fs.LocalFileSystem$LocalNFSFileOutputStream.write(
>         ... 4 more
> Exception in thread "main" Job failed!
>         at org.apache.nutch.mapred.JobClient.runJob(
>         at org.apache.nutch.crawl.Fetcher.fetch(
>         at org.apache.nutch.crawl.Crawl.main(
> byron@db02:/data/nutch$ df -k
> It appears crawl created a /tmp/nutch directory that filled up even though i specified
a db directory.
> Need to add a parameter to the command line or make a globaly configurable /tmp (work
area) for the nutch instance so that crawls won't fail.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message