nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wade Lau (JIRA)" <j...@apache.org>
Subject [jira] Updated: (NUTCH-957) timelimit.mins is invalid when depth is greater than 1
Date Sun, 16 Jan 2011 16:08:43 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Wade Lau updated NUTCH-957:
---------------------------

    Summary: timelimit.mins is invalid when depth is greater than 1  (was: timelimit.mins
is invalid when depth greater than 1)

> timelimit.mins is invalid when depth is greater than 1
> ------------------------------------------------------
>
>                 Key: NUTCH-957
>                 URL: https://issues.apache.org/jira/browse/NUTCH-957
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 1.2
>         Environment: openSUSE 11.3, jdk-1.6, ant-1.8, tomcat-6.0, nutch-1.2
>            Reporter: Wade Lau
>             Fix For: 1.2
>
>
> The setting value of  fetcher.timelimit.mins will be invalid when runing ./bin/nutch
crawl with depth=n (n>1).
> The reason is that the value of fetcher.timelimit.mins has been reset in the following
paragraph ( org.apache.nutch.fetcher.Fetcher.java ), 
> long timelimit = getConf().getLong("fetcher.timelimit.mins", -1);
> if (timelimit != -1) {
>   timelimit = System.currentTimeMillis() + (timelimit * 60 * 1000);
>   LOG.info("Fetcher Timelimit set for : " + timelimit);
>   getConf().setLong("fetcher.timelimit.mins", timelimit);
> }
> when the crawler goes down to next depth, the value will be the time value of last one
which is timelimit.mins + currentTimeMillis.
> Some logs look like:
> depth=1 
> Fetcher: starting at 2011-01-16 20:58:53
> Fetcher: segment: crawl/segments/20110116205851
> Fetcher Timelimit set for : 1295182793540 now is:[1295182733540] timelimit:[1] new.sum:[1295182793540]
> depth=2
> Fetcher: starting at 2011-01-16 21:00:20
> Fetcher: segment: crawl/segments/20110116210018
> Fetcher Timelimit set for : 77712262795220167 now is:[1295182820167] timelimit:[1295182793540]
new.sum:[77712262795220167]
> The solution is easy to go as below:
> long timelimit = getConf().getLong("fetcher.timelimit.mins.init", -1);
> if( timelimit == -1)
> {
>     timelimit = getConf().getLong("fetcher.timelimit.mins", -1);
>     getConf().setLong("fetcher.timelimit.mins.init", timelimit);
> }
> if (timelimit != -1) {
>   timelimit = System.currentTimeMillis() + (timelimit * 60 * 1000);
>   LOG.info("Fetcher Timelimit set for : " + timelimit);
>   getConf().setLong("fetcher.timelimit.mins", timelimit);
> }
> Hope  this will be helpful for the next release, and save time for others.
> refer:
> http://ufqi.com/exp/x1183.html?title=apache.nutch.timelimit.bug

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message