[ https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994191#comment-13994191
]
Daniel Kugel commented on NUTCH-1622:
-------------------------------------
I might have done something wrong but reading the Nutch 2.x code I was under the impression
that the only way to pass data between map/reduce jobs (the outlink data) is appending something
that can be stored in an HBase table and/or mapped by Gora and for that the ByteBuffer was
used.
There's a chance I misunderstood a key concept. That patch was a quick hack.
You seem to separate two concerns that I don't fully understand. I will be happy if you could
elaborate.
We can continue this talk on the mailing list if this is not the platform for this sort of
discussion.
> Create Outlinks with metadata
> -----------------------------
>
> Key: NUTCH-1622
> URL: https://issues.apache.org/jira/browse/NUTCH-1622
> Project: Nutch
> Issue Type: New Feature
> Components: parser
> Affects Versions: 1.7, 2.2.1
> Reporter: Julien Nioche
> Assignee: Julien Nioche
> Fix For: 1.8, 2.4
>
> Attachments: NUTCH-1622-2.x.patch, NUTCH-1622.patch
>
>
> Having the possibility to specify metadata when creating an outlink is extremely useful
as it allows to pass information from a source page to the pages it links to. We use that
routinely within our custom parsers in combination with the url-meta plugin.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
|