nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wade Lau (JIRA)" <j...@apache.org>
Subject [jira] Created: (NUTCH-957) timelimit.mins is invalid when depth greater than 1
Date Sun, 16 Jan 2011 16:06:48 GMT
timelimit.mins is invalid when depth greater than 1
---------------------------------------------------

                 Key: NUTCH-957
                 URL: https://issues.apache.org/jira/browse/NUTCH-957
             Project: Nutch
          Issue Type: Bug
          Components: fetcher
    Affects Versions: 1.2
         Environment: openSUSE 11.3, jdk-1.6, ant-1.8, tomcat-6.0, nutch-1.2
            Reporter: Wade Lau
             Fix For: 1.2


The setting value of  fetcher.timelimit.mins will be invalid when runing ./bin/nutch crawl
with depth=n (n>1).

The reason is that the value of fetcher.timelimit.mins has been reset in the following paragraph
( org.apache.nutch.fetcher.Fetcher.java ), 

long timelimit = getConf().getLong("fetcher.timelimit.mins", -1);
if (timelimit != -1) {
  timelimit = System.currentTimeMillis() + (timelimit * 60 * 1000);
  LOG.info("Fetcher Timelimit set for : " + timelimit);
  getConf().setLong("fetcher.timelimit.mins", timelimit);
}

when the crawler goes down to next depth, the value will be the time value of last one which
is timelimit.mins + currentTimeMillis.

Some logs look like:

depth=1 
Fetcher: starting at 2011-01-16 20:58:53
Fetcher: segment: crawl/segments/20110116205851
Fetcher Timelimit set for : 1295182793540 now is:[1295182733540] timelimit:[1] new.sum:[1295182793540]
depth=2
Fetcher: starting at 2011-01-16 21:00:20
Fetcher: segment: crawl/segments/20110116210018
Fetcher Timelimit set for : 77712262795220167 now is:[1295182820167] timelimit:[1295182793540]
new.sum:[77712262795220167]

The solution is easy to go as below:

long timelimit = getConf().getLong("fetcher.timelimit.mins.init", -1);
if( timelimit == -1)
{
    timelimit = getConf().getLong("fetcher.timelimit.mins", -1);
    getConf().setLong("fetcher.timelimit.mins.init", timelimit);
}
if (timelimit != -1) {
  timelimit = System.currentTimeMillis() + (timelimit * 60 * 1000);
  LOG.info("Fetcher Timelimit set for : " + timelimit);
  getConf().setLong("fetcher.timelimit.mins", timelimit);
}


Hope  this will be helpful for the next release, and save time for others.

refer:
http://ufqi.com/exp/x1183.html?title=apache.nutch.timelimit.bug





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message