nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (NUTCH-1414) Date extraction parse filter
Date Mon, 18 Jul 2016 21:36:20 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383121#comment-15383121
] 

Markus Jelsma edited comment on NUTCH-1414 at 7/18/16 9:35 PM:
---------------------------------------------------------------

Java provides proper PCRE compatible regular expressions by default. Using online tools like
regexplanet.com will help you to quickly verify your regexes. Make sure you have some unit
tests to verify your plugin. Again, see examples of referenced and other Nutch plugins.


was (Author: markus17):
Java provides proper PCRE compatible regular expressions by default. Using online tools like
regexplanet.com will help you to quickly verify your regexes. Make sure you have some unit
tests to verify your plugin.

> Date extraction parse filter
> ----------------------------
>
>                 Key: NUTCH-1414
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1414
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Markus Jelsma
>         Attachments: NUTCH-1414-1.6-1-testdata.patch, NUTCH-1414-1.6-1.patch
>
>
> Date extraction parse filter for Nutch to provide means to extract an arbitrary page
date (article date) from the parse text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message