nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsengtan A Shuy" <ttas...@sbcglobal.net>
Subject RE: inject command fail on whole-web run
Date Sun, 15 Jul 2007 00:17:07 GMT
I am able to fix the problem of last email and go through the command of
whole-web site crawl from nutch-0.8.x tutorial.

But the resultant folder crawl is still very small, and the last search of
"apache", I got the "hit 0" message.  Something is still wrong.

Please give me some feedback.

Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com
-----Original Message-----
From: Tsengtan A Shuy [mailto:ttashuy@sbcglobal.net] 
Sent: Saturday, July 14, 2007 12:11 PM
To: nutch-dev@lucene.apache.org
Subject: inject command fail on whole-web run

I am running the "bin/nutch inject crawl/crawldb dmoz" command on my ubuntu
OS by following the nutch-0.8.x tutorial. But I got the following error
message:

2007-07-14 11:38:35,238 WARN  mapred.LocalJobRunner
(LocalJobRunner.java:run(120)) - job_ij0atx
java.lang.NoClassDefFoundError: dk/brics/automaton/RunAutomaton
        at
org.apache.nutch.urlfilter.automaton.AutomatonURLFilter$Rule.<init>(Automato
nURLFilter.java:89)
        at
org.apache.nutch.urlfilter.automaton.AutomatonURLFilter.createRule(Automaton
URLFilter.java:70)
        at
org.apache.nutch.urlfilter.api.RegexURLFilterBase.readRulesFile(RegexURLFilt
erBase.java:191)
        at
org.apache.nutch.urlfilter.api.RegexURLFilterBase.setConf(RegexURLFilterBase
.java:140)
        at
org.apache.nutch.plugin.Extension.getExtensionInstance(Extension.java:153)
        at org.apache.nutch.net.URLFilters.<init>(URLFilters.java:53)
        at
org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java:56)
        at org.apache.hadoop.mapred.JobConf.newInstance(JobConf.java:443)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:33)
        at org.apache.hadoop.mapred.JobConf.newInstance(JobConf.java:443)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:125)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:91)
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:138)
        at org.apache.nutch.crawl.Injector.main(Injector.java:164)
adamshuy@adamshuy-desktop:~/nutch-0.8.1$ 
What is wrong in my ubuntu environment?
Please help!!

Adam Shuy, President
ePacific Web Design & Hosting
Professional Web/Software developer
TEL: 408-272-6946
www.epacificweb.com


Mime
View raw message