[ https://issues.apache.org/jira/browse/NUTCH-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sujen Shah updated NUTCH-2152: ------------------------------ Attachment: NUTCH-2152.git.patch Here is the first iteration of the patch. The commoncrawl dump via the service endpoint can be called in the following manner: 1. POST /services/commoncrawldump Request data - application/json { "confId":"default", "crawlId":"crawl01", "args":{"mimetypes":["text/html", "", .....], other params} } Response contains the path of the created resource (type:text/plain) 2. To get all the dump paths for a particular crawlId, you can call GET /services/commoncrawldump/{crawlId} Response: application/json { "dumpPaths":[......list of paths.....] } > CommonCrawl dump via Service endpoint > ------------------------------------- > > Key: NUTCH-2152 > URL: https://issues.apache.org/jira/browse/NUTCH-2152 > Project: Nutch > Issue Type: Sub-task > Components: REST_api > Affects Versions: 1.12 > Reporter: Sujen Shah > Assignee: Sujen Shah > Labels: memex > Fix For: 1.12 > > Attachments: NUTCH-2152.git.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)