nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Cooper-Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Created] (NUTCH-1872) enables control over how injected metadata is propagated
Date Thu, 09 Oct 2014 20:39:35 GMT
Jonathan Cooper-Ellis created NUTCH-1872:
--------------------------------------------

             Summary: enables control over how injected metadata is propagated
                 Key: NUTCH-1872
                 URL: https://issues.apache.org/jira/browse/NUTCH-1872
             Project: Nutch
          Issue Type: New Feature
            Reporter: Jonathan Cooper-Ellis
            Priority: Minor


This builds on NUTCH-655 and NUTCH-855, allowing users some control over which outlinks receive
injected metadata. A new configuration property "urlmeta.rule" has been introduced, with a
default value of "all".

The value "all" indicated that "urlmeta.tags" should be propagated to all outlinks. Other
options include: "host" (propagated to outlinks with the same host as the url with which the
metadata was injected), "domain" (same, except with the same domain), "prefix" (treats the
injected url as a prefix, so metadata is only propagated to urls that extend the injected
url).

Would appreciate feedback on whether you think this is a useful feature, and if its implemented
properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message