nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Kugel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1622) Create Outlinks with metadata
Date Sat, 10 May 2014 22:04:47 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994191#comment-13994191
] 

Daniel Kugel commented on NUTCH-1622:
-------------------------------------

I might have done something wrong but reading the Nutch 2.x code I was under the impression
that the only way to pass data between map/reduce jobs (the outlink data) is appending something
that can be stored in an HBase table and/or mapped by Gora and for that the ByteBuffer was
used.
There's a chance I misunderstood a key concept. That patch was a quick hack.
You seem to separate two concerns that I don't fully understand. I will be happy if you could
elaborate.
We can continue this talk on the mailing list if this is not the platform for this sort of
discussion.

> Create Outlinks with metadata
> -----------------------------
>
>                 Key: NUTCH-1622
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1622
>             Project: Nutch
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.7, 2.2.1
>            Reporter: Julien Nioche
>            Assignee: Julien Nioche
>             Fix For: 1.8, 2.4
>
>         Attachments: NUTCH-1622-2.x.patch, NUTCH-1622.patch
>
>
> Having the possibility to specify metadata when creating an outlink is extremely useful
as it allows to pass information from a source page to the pages it links to. We use that
routinely within our custom parsers in combination with the url-meta plugin.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message