nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doğacan Güney" <doga...@gmail.com>
Subject Re: First Plugin
Date Fri, 05 Oct 2007 13:25:49 GMT
On 10/5/07, Sagar Vibhute <sagar020785@gmail.com> wrote:
> I am really sorry for that. Will take some time to get used to this one :-)
>
> This is the log for the last nutch crawl I tried to execute:
> -----------------------------------------------------------------------------------------------------------------
> 2007-10-05 12:16:33,416 INFO  crawl.Crawl - crawl started in:
> /home/sagar/nutch_crawl
> 2007-10-05 12:16:33,417 INFO  crawl.Crawl - rootUrlDir =
> /home/sagar/urls/iiitb
> 2007-10-05 12:16:33,417 INFO  crawl.Crawl - threads = 10
> 2007-10-05 12:16:33,417 INFO  crawl.Crawl - depth = 3
> 2007-10-05 12:16:33,522 INFO  crawl.Injector - Injector: starting
> 2007-10-05 12:16:33,523 INFO  crawl.Injector - Injector: crawlDb:
> /home/sagar/nutch_crawl/crawldb
> 2007-10-05 12:16:33,523 INFO  crawl.Injector - Injector: urlDir:
> /home/sagar/urls/iiitb
> 2007-10-05 12:16:33,524 INFO  crawl.Injector - Injector: Converting injected
> urls to crawl db entries.
> 2007-10-05 12:16:34,116 INFO  plugin.PluginRepository - Plugins: looking in:
> /home/sagar/nutch-0.9/plugins
> 2007-10-05 12:16:34,277 INFO  plugin.PluginRepository - Plugin
> Auto-activation mode: [true]
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository - Registered Plugins:
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     the nutch core
> extension points (nutch-extensionpoints)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Basic Query
> Filter (query-basic)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     CyberNeko HTML
> Parser (lib-nekohtml)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Basic Indexing
> Filter (index-basic)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Html Parse
> Plug-in (parse-html)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Site Query
> Filter (query-site)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     URL Query Filter
> (query-url)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     HTTP Framework
> (lib-http)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Text Parse
> Plug-in (parse-text)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Regex URL Filter
> (urlfilter-regex)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Regex URL Filter
> Framework (lib-regex-filter)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Http Protocol
> Plug-in (protocol-http)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository - Registered
> Extension-Points:
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Nutch Summarizer
> (org.apache.nutch.searcher.Summarizer)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Nutch URL
> Normalizer (org.apache.nutch.net.URLNormalizer)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Nutch Protocol (
> org.apache.nutch.protocol.Protocol)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Nutch Analysis (
> org.apache.nutch.analysis.NutchAnalyzer)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Nutch URL Filter
> (org.apache.nutch.net.URLFilter)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Nutch Indexing
> Filter (org.apache.nutch.indexer.IndexingFilter)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Nutch Online
> Search Results Clustering Plugin (
> org.apache.nutch.clustering.OnlineClusterer)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     HTML Parse
> Filter (org.apache.nutch.parse.HtmlParseFilter)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Nutch Content
> Parser (org.apache.nutch.parse.Parser)
> 2007-10-05 12:16:34,278 INFO  plugin.PluginRepository -     Nutch Scoring (
> org.apache.nutch.scoring.ScoringFilter)
> 2007-10-05 12:16:34,279 INFO  plugin.PluginRepository -     Nutch Query
> Filter (org.apache.nutch.searcher.QueryFilter)
> 2007-10-05 12:16:34,279 INFO  plugin.PluginRepository -     Ontology Model
> Loader (org.apache.nutch.ontology.Ontology)
> 2007-10-05 12:16:34,296 WARN  mapred.LocalJobRunner - job_fx2l2k
> java.lang.RuntimeException: No scoring plugins - at least one scoring plugin
> is required!
>     at org.apache.nutch.scoring.ScoringFilters.<init>(ScoringFilters.java
> :85)
>     at org.apache.nutch.crawl.Injector$InjectMapper.configure(Injector.java
> :61)
>     at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java
> :58)
>     at org.apache.hadoop.util.ReflectionUtils.newInstance(
> ReflectionUtils.java:82)
>     at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>     at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java
> :58)
>     at org.apache.hadoop.util.ReflectionUtils.newInstance(
> ReflectionUtils.java:82)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:170)
>     at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java
> :126)
> ---------------------------------------------------------------------------------------------------------------------
>
> I am totally new to this. Your insights please.

OK, it seems you have removed scoring-opic plugins (and  other scoring
plugins if you have any) by accident. You should check your
plugin.includes option in nutch-site.xml, there is probably something
wrong with that. Perhaps, you put a new line there?

>
> - Sagar
>


-- 
Doğacan Güney
Mime
View raw message