nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] Updated: (NUTCH-932) Bulk REST API to retrieve crawl results as JSON
Date Thu, 04 Nov 2010 18:46:41 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andrzej Bialecki  updated NUTCH-932:
------------------------------------

    Attachment: NUTCH-932.patch

This patch adds bulk retrieval of crawl results. This is still very rough, e.g. there's no
way to select crawlId or limit the fields... but it returns proper JSON.

This patch also includes other enhancements and bugfixes - with this patch I was able to perform
a complete crawl cycle via REST.

> Bulk REST API to retrieve crawl results as JSON
> -----------------------------------------------
>
>                 Key: NUTCH-932
>                 URL: https://issues.apache.org/jira/browse/NUTCH-932
>             Project: Nutch
>          Issue Type: New Feature
>          Components: REST_api
>    Affects Versions: 2.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>         Attachments: NUTCH-932.patch
>
>
> It would be useful to be able to retrieve results of a crawl as JSON. There are a few
things that need to be discussed:
> * how to return bulk results using Restlet (WritableRepresentation subclass?)
> * what should be the format of results?
> I think it would make sense to provide a single record retrieval (by primary key), all
records, and records within a range. This incidentally matches well the capabilities of the
Gora Query class :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message