nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (NUTCH-2574) Generator: hostCount >= maxCount comparison wrong
Date Wed, 06 Jun 2018 12:21:00 GMT


ASF GitHub Bot commented on NUTCH-2574:

sebastian-nagel opened a new pull request #344: NUTCH-2574 Generator: hostCount >= maxCount
comparison wrong
   - ensure that also last created segment contains maxCount URLs per host
   - use local variable to hold host-specific maxCount set in HostDb  (do not modify instance
variable temporarily)
   - fix Java compile warnings: add missing generic type parameters

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Generator: hostCount >= maxCount comparison wrong
> -------------------------------------------------
>                 Key: NUTCH-2574
>                 URL:
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.13
>            Reporter: Michael Coffey
>            Assignee: Sebastian Nagel
>            Priority: Minor
>             Fix For: 1.15
> In the Generator.Selector.reduce function, there is a comparison of hostCount[1] to maxCount,
to determine whether or not to push the current URL to the next segment. The purpose is to
honor generate.max.count.
> Sebastian noticed that it should test if (hostCount[1] > maxCount) rather than ">=". 
As it stands, the code sometimes puts one less url into a segment than it should.

This message was sent by Atlassian JIRA

View raw message