nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] Updated: (NUTCH-880) REST API (and webapp) for Nutch
Date Tue, 26 Oct 2010 21:01:20 GMT


Andrzej Bialecki  updated NUTCH-880:

    Attachment: API-2.patch

An improved version, which actually works :) The configuration and job management is implemented,
there is also a unit test that exercises this API.

If there are no objections I'd like to commit this first version of the API, and continue
improving it in other issues.

> REST API (and webapp) for Nutch
> -------------------------------
>                 Key: NUTCH-880
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 2.0
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>         Attachments: API-2.patch, API.patch
> This issue is for discussing a REST-style API for accessing Nutch.
> Here's an initial idea:
> * I propose to use org.restlet for handling requests and returning JSON/XML/whatever
> * hook up all regular tools so that they can be driven via this API. This would have
to be an async API, since all Nutch operations take long time to execute. It follows then
that we need to be able also to list running operations, retrieve their current status, and
possibly abort/cancel/stop/suspend/resume/...? This also means that we would have to potentially
create & manage many threads in a servlet - AFAIK this is frowned upon by J2EE purists...
> * package this in a webapp (that includes all deps, essentially nutch.job content), with
the restlet servlet as an entry point.
> Open issues:
> * how to implement the reading of crawl results via this API
> * should we manage only crawls that use a single configuration per webapp, or should
we have a notion of crawl contexts (sets of crawl configs) with CRUD ops on them? this would
be nice, because it would allow managing of several different crawls, with different configs,
in a single webapp - but it complicates the implementation a lot.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message