nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Vlcek <lukas.vl...@gmail.com>
Subject Re: mapred crawling exception - Job failed!
Date Wed, 04 Jan 2006 07:24:57 GMT
Note: I mistakenly used nutch-user email for reply-to value. Feel free
to reply to either nutch-dev or nutch-user as I monitor both of them
:-)
Anyway can anybody tell me how I can easily change reply-to value in
gmail? I am struggling with this all the time especially when replying
to multiple mail-lists....

On 1/4/06, Lukas Vlcek <lukas.vlcek@gmail.com> wrote:
> Hi,
>
> I am trying to use the latest nutch-trunk version but I am facing
> unexpected "Job failed!" exception. It seems that all crawling work
> has been already done but some threads are hunged which results into
> exception after some timeout.
>
> I am not sure whether this is a real nutch issue or just mine
> misunderstanding of proper configuration.
>
> The following are the details:
> I am trying to run nutch-trunk version on one machine (Linux). I used
> the latest svn and produced fresh installation package using "ant
> tar". Then I modified nutch-site.xml only (see attachement) - I
> believe I didn't change anything special. I was doing modifications to
> [fetcher.threads.fetch] and [fetcher.threads.per.host] as well but
> this didn't seem to help.
>
> Typically, nutch crawl process seemed to work fine and it crawled all
> documents on my local apache server (both nutch and apache run on the
> same machine) but then it didn't stop but was waiting for something to
> finish. Since then it was just producing lines like [060103 231602 16
> pages, 0 errors, 0.4 pages/s, 305 kb/s, ] into log where the later two
> numbers (pages/s, kb/s) where decreasing as time went by (that is
> logical).
>
> Then I receive the following exception:
> Sometime it even contains log massege saying
> "Aborting with "+activeThreads+" hung threads." where activeThreads
> was some number (this number differs based on conf setup).
>
> ... (see crawl.log attachement file for whole log)
> 060103 231602 16 pages, 0 errors, 0.4 pages/s, 305 kb/s,
> 060103 231602 16 pages, 0 errors, 0.4 pages/s, 305 kb/s,
> java.lang.NullPointerException
>         at java.lang.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:980)
>         at java.lang.Float.parseFloat(Float.java:222)
>         at org.apache.nutch.parse.ParseOutputFormat$1.write(ParseOutputFormat.java:84)
>         at org.apache.nutch.fetcher.FetcherOutputFormat$1.write(FetcherOutputFormat.java:80)
>         at org.apache.nutch.mapred.ReduceTask$2.collect(ReduceTask.java:247)
>         at org.apache.nutch.mapred.lib.IdentityReducer.reduce(IdentityReducer.java:41)
>         at org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
>         at org.apache.nutch.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:90)
> 060103 231603  map 100%
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:308)
>         at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:344)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:111)
>
> Does anybody know what is wrong?
>
> Regards,
> Lukas
>
>
>

Mime
View raw message