nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1842) crawl.gen.delay has a wrong default value in nutch-default.xml or is being parsed incorrectly
Date Thu, 15 Nov 2018 10:45:01 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16687791#comment-16687791
] 

Hudson commented on NUTCH-1842:
-------------------------------

FAILURE: Integrated in Jenkins build Nutch-trunk #3589 (See [https://builds.apache.org/job/Nutch-trunk/3589/])
NUTCH-1842: crawl.gen.delay value is read incorrectly from (github: [https://github.com/apache/nutch/commit/8b7298da1f04ade38f986b225134345456f07c32])
* (edit) src/java/org/apache/nutch/crawl/Generator.java


> crawl.gen.delay has a wrong default value in nutch-default.xml or is being parsed incorrectly

> ----------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-1842
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1842
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.9
>            Reporter: kaveh minooie
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.16
>
>
> this is from nutch-default.xml:
> <property>
>   <name>crawl.gen.delay</name>
>   <value>604800000</value>
>   <description>
>    This value, expressed in milliseconds, defines how long we should keep the lock on
records 
>    in CrawlDb that were just selected for fetching. If these records are not updated

>    in the meantime, the lock is canceled, i.e. they become eligible for selecting. 
>    Default value of this is 7 days (604800000 ms).
>   </description>
> </property>
> this is the from o.a.n.crawl.Generator.configure(JobConf job)
> genDelay = job.getLong(GENERATOR_DELAY, 7L) * 3600L * 24L * 1000L;
> the value in config file is in milliseconds but the code expect it to be in days. I reported
this couple of years ago on the mailing list as well. I didn't post a patch becaue I am not
sure which one needs to be fixed. considering all the other values in config file are in milliseconds
it can be argued to that consistency matters, but 'day' is a much more reasonable unit for
this property.
> Also this value is not being used in 2.x ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message