nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Nagel (Jira)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2419) Domain blacklist URL filter does not respect command-line override for file
Date Tue, 12 May 2020 13:12:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105413#comment-17105413
] 

Sebastian Nagel commented on NUTCH-2419:
----------------------------------------

Working on a patch. Turned out that the situation is more confused: the configured rule file
does not take precedence over the attribute file the filters "domain", "domainblacklist",
"prefix", "suffix" (but not "regex" and "automaton"), for the URL normalizers "host", "slash"
and "protocol" and for "parsefilter-regex".

> Domain blacklist URL filter does not respect command-line override for file
> ---------------------------------------------------------------------------
>
>                 Key: NUTCH-2419
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2419
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.17
>
>         Attachments: NUTCH-2419.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message