nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-2297) CrawlDbReader -stats wrong values for earliest fetch time and shortest interval
Date Mon, 08 Aug 2016 10:06:20 GMT
Sebastian Nagel created NUTCH-2297:
--------------------------------------

             Summary: CrawlDbReader -stats wrong values for earliest fetch time and shortest
interval
                 Key: NUTCH-2297
                 URL: https://issues.apache.org/jira/browse/NUTCH-2297
             Project: Nutch
          Issue Type: Bug
          Components: crawldb
    Affects Versions: 1.13
            Reporter: Sebastian Nagel
            Assignee: Sebastian Nagel
            Priority: Minor
             Fix For: 1.13


NUTCH-2286 added min, max and average for fetch interval and fetch time.
When running in distributed mode (not reproducible in local mode), the values for the minimum
(earliest fetch time and shortest fetch interval) may be wrong with implausible values:
{noformat}
TOTAL urls: 7180518032
 shortest fetch interval:    175 days, 00:00:00             <<<<<< ????
 avg fetch interval: 10 days, 08:01:36
 longest fetch interval:     15 days, 18:00:00
 earliest fetch time:        Thu Dec 20 05:30:00 UTC 3106   <<<<<< ????
 avg of fetch times: Fri Feb 19 00:07:00 UTC 2016
 latest fetch time:  Mon Jul 18 05:22:00 UTC 2016
 retry 0:    6907984913
 retry 1:    148125397
 retry 2:    82761892
 retry 3:    41645830
 min score:  0.0
 avg score:  0.014360981
 max score:  9.25
 ...
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message