nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2334) Extension point for schedulers
Date Wed, 29 Mar 2017 09:59:41 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946865#comment-15946865
] 

Sebastian Nagel commented on NUTCH-2334:
----------------------------------------

Hi [~roannel],

see [scoring-adaptive|https://github.com/commoncrawl/nutch/blob/cc/src/plugin/scoring-adaptive/src/java/org/apache/nutch/scoring/adaptive/AdaptiveScoringFilter.java]
which tries to do fetch scheduling in a ScoringFilter in combination with -topN, generator.min.score,
and (per-host/per-queue) generater.max.count. The main difference is that configuration changes
immediately impact the fetch list generation while a FetchSchedule sets (re)fetch time and
intervals beforehand during CrawlDb update.

> having schedulers as plugins is an easier way to use and develop them and maybe you can
use several at the same time
That's true. You could stack FetchSchedule implementations via inheritance and then call {{super.shouldFetch(...)}}.
But that's not really transparent and configurable.

What is you suggestion for an schedule plugin interface?

> Extension point for schedulers
> ------------------------------
>
>                 Key: NUTCH-2334
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2334
>             Project: Nutch
>          Issue Type: New Feature
>          Components: generator
>    Affects Versions: 1.12
>            Reporter: Roannel Fernández Hernández
>            Priority: Minor
>             Fix For: 1.14
>
>
> With an extension point for schedulers, the users should be able to create new schedulers
that meet to their own needs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message