nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lewis John McGibbney (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2327) Seeds injected in REST workflow must be ingested into HDFS
Date Tue, 18 Oct 2016 07:04:59 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584681#comment-15584681
] 

Lewis John McGibbney commented on NUTCH-2327:
---------------------------------------------

OK, we can hangout and sort this one through. I've also asked Sachin to chime in with any
updates as having this stable for 1.13 would be a real step forward.

> Seeds injected in REST workflow must be ingested into HDFS
> ----------------------------------------------------------
>
>                 Key: NUTCH-2327
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2327
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector, REST_api
>    Affects Versions: 1.12
>            Reporter: Lewis John McGibbney
>             Fix For: 1.13
>
>
> Right now when one uses the REST POST /seed/create API, a directory is created within
/var/some/path/here which is create if you are working locally with the Nutch server e.g.
on one machine. It is however not suitable for using the REST API in distributed deployments
where seeds needs to be present within HDFS. More documentation on this topic is available
at 
> https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI#Seed_List_creation
> There are also various mailing list threads regarding use of the REST and this injector
url issue described above needs to be addressed.
> [~sujenshah] CC for context.
> http://www.mail-archive.com/user%40nutch.apache.org/msg14922.html
> http://www.mail-archive.com/user%40nutch.apache.org/msg14921.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message