nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1414) Date extraction parse filter
Date Mon, 18 Jul 2016 11:09:20 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15382088#comment-15382088
] 

Markus Jelsma commented on NUTCH-1414:
--------------------------------------

Sure:

{code}
<property>
  <name>index.parse.md</name>
  <value>metatag.description,metatag.keywords</value>
  <description>
  Comma-separated list of keys to be taken from the parse metadata to generate fields.
  Can be used e.g. for 'description' or 'keywords' provided that these values are generated
  by a parser (see parse-metatags plugin)  
  </description>
</property>
{code}

> Date extraction parse filter
> ----------------------------
>
>                 Key: NUTCH-1414
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1414
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Markus Jelsma
>         Attachments: NUTCH-1414-1.6-1-testdata.patch, NUTCH-1414-1.6-1.patch
>
>
> Date extraction parse filter for Nutch to provide means to extract an arbitrary page
date (article date) from the parse text.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message