nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2413) When fetching and parsing together, parameter "parse.filter.urls" is ignored
Date Sat, 26 Aug 2017 08:46:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142697#comment-16142697
] 

ASF GitHub Bot commented on NUTCH-2413:
---------------------------------------

sebastian-nagel commented on issue #216: fix for NUTCH-2413 contributed by maborec
URL: https://github.com/apache/nutch/pull/216#issuecomment-325103769
 
 
   Thanks, looks good to me and verified that nothing is filtered now with a parsing fetcher
and `-Dparse.filter.urls=false`.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> When fetching and parsing together, parameter "parse.filter.urls" is ignored
> ----------------------------------------------------------------------------
>
>                 Key: NUTCH-2413
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2413
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, parser
>    Affects Versions: 1.13
>         Environment: Apache Nutch release 1.13.
>            Reporter: Marcos Bori
>             Fix For: 1.14
>
>
> In a situation when we want to:
> (1) Execute the fetch and parse together ("fetcher.parse" setting to "true")
> (2) Avoid applying the URL filters when executing this phase.
> Condition (2) can be configured when parsing is executed as a separate process by setting
"parse.filter.urls" to "false".
> However, this setting ("parse.filter.urls") is ignored when we execute the fetch and
parse phases together. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message