nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sami Siren (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-361) generator create fetchlist randomly
Date Sun, 03 Sep 2006 04:39:23 GMT
    [ http://issues.apache.org/jira/browse/NUTCH-361?page=comments#action_12432322 ] 
            
Sami Siren commented on NUTCH-361:
----------------------------------

I started to write (allready put some on svn trunk) some simple junit tests for the main tools
(inject, generate, fetch). if you can extend some of those to demonstrate this problem then
it would be easier to track down.

> generator create fetchlist randomly
> -----------------------------------
>
>                 Key: NUTCH-361
>                 URL: http://issues.apache.org/jira/browse/NUTCH-361
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0
>         Environment: Java 1.5, FreeBSD 6.1
>            Reporter: Uros Gruber
>            Priority: Critical
>
> I noticed problems during generating fetchlist. I already post some info at the users
list. Today I check release 0.8 and I'm certain that problem is only in version later than
this. I've do testnig only on 0.8 and svn from today.
> The problem is that generator generate fetchlist from crawldb but everytime i run there
is different number of urls in fetchlist.
> For example I put 6 test urls we have for testing and only 5 of 20 test there were all
urls listed in fetchlist, sometimes onyl one. Config was always the same also when testing
at version 0.8.
> I try to debug what might go wrong but I only end up that in /tmp there were all urls
but somehow missed in crawl_generate
> I also se some of 
> 2006-09-02 20:14:20,147 DEBUG conf.Configuration - java.io.IOException: config(config)
>         at org.apache.hadoop.conf.Configuration.<init>(Configuration.java:76)
>         at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:87)
>         at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:98)
>         at org.apache.nutch.util.NutchJob.<init>(NutchJob.java:26)
>         at org.apache.nutch.crawl.Generator.generate(Generator.java:330)
>         at org.apache.nutch.crawl.Generator.run(Generator.java:405)
>         at org.apache.nutch.util.ToolBase.doMain(ToolBase.java:145)
>         at org.apache.nutch.crawl.Generator.main(Generator.java:372)
> if I enable DEBUG loging but I doubt that this has anything to do with this.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message