nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2413) Parsing fetcher to respect property "parse.filter.urls"
Date Sat, 26 Aug 2017 09:54:01 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16142732#comment-16142732
] 

Hudson commented on NUTCH-2413:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch-trunk #3452 (See [https://builds.apache.org/job/Nutch-trunk/3452/])
fix for NUTCH-2413 contributed by maborec (marcos: [https://github.com/apache/nutch/commit/6c648633cecc158f409e3a4ec45cf33bc68b4b1d])
* (edit) src/java/org/apache/nutch/fetcher/FetcherThread.java
fix for NUTCH-2413 contributed by maborec (marcos: [https://github.com/apache/nutch/commit/5dc48f2fc2f7a6f9d039251b9133df12bee99d52])
* (edit) src/java/org/apache/nutch/fetcher/FetcherThread.java
NUTCH-2413 - Fix some styling. Prepare filters and normalizers in (marcos: [https://github.com/apache/nutch/commit/60af77262726e8a09202a2319add512c54e7a2f4])
* (edit) src/java/org/apache/nutch/fetcher/FetcherThread.java


> Parsing fetcher to respect property "parse.filter.urls"
> -------------------------------------------------------
>
>                 Key: NUTCH-2413
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2413
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, parser
>    Affects Versions: 1.13
>         Environment: Apache Nutch release 1.13.
>            Reporter: Marcos Bori
>            Assignee: Sebastian Nagel
>             Fix For: 1.14
>
>
> In a situation when we want to:
> (1) Execute the fetch and parse together ("fetcher.parse" setting to "true")
> (2) Avoid applying the URL filters when executing this phase.
> Condition (2) can be configured when parsing is executed as a separate process by setting
"parse.filter.urls" to "false".
> However, this setting ("parse.filter.urls") is ignored when we execute the fetch and
parse phases together. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message