nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Mattmann <>
Subject Re: OpenSearch API (Re: Nutch / CGI)
Date Wed, 30 Mar 2005 19:44:08 GMT
Hi Doug et al.,

   I've been working with my partner at University of Southern California,
Rami Al-Ghanmi, to develop such an RSS/XML service for Nutch and am happy to
report that we are near complete on it. Our project proposal is available

If you like, we can submit it as a JIRA issue, and then folks can vote on it
if they like it.

Thanks much.


On 3/30/05 10:21 AM, "Doug Cutting" <> wrote:

> Andrzej Bialecki wrote:
>> This is yet another case that speaks in favor of adding an
>> "out-of-the-box" XML API to Nutch.
> Yes, I agree.
>> * REST - HTTP GET or POST request, with query parameters contained in
>> GET or POST parameters. An XML data document with results is a response.
>> Lightweight, easy to implement and create, and relatively easy to
>> consume. Lack of high-level API-s in most programming languages could be
>> a problem, though.
> In particular, I would love to see a REST contribution.  It should not
> require more than a simple servlet or jsp page that uses NutchBean.
> This logic should be much the same as the current search.jsp, but the
> output would be xml instead of html.  Also this would need to provide
> documentation of both the url parameters and the xml result schema.
> Once this is implemented, search.jsp can be replaced with a filter that
> applies a stylesheet to XML search results.
>> * RSS - a special case of the above, where the response follows a
>> standard schema. A big advantage to use this is its popularity and a
>> large base of tools (libraries, readers, aggregators).
> This would also be very useful.  This could even be the primary API.  We
> can use namespaces to provide, e.g., non-standard item elements.
>> * SOAP - SOAP-encoded request and response. Well integrated into most
>> programming languages, but certainly less efficient (consumes more
>> bandwidth, CPU and memory to create and consume).
>> * XML-RPC - more lightweight than SOAP, but follows a similar RPC paradigm.
> These are a lower priority for me, but such contributions would be welcome.
>> AFAIK, there is a specification called OpenSearch, an extension to RSS,
>> created by Amazon/A9. However, I was unable to find the terms of use for
>> that specification, so it might be encumbered. As I wrote above, using
>> RSS gives strong advantages, so it would be nice to figure out if we can
>> use it.
> I have written to folks at A9 asking about this.  I will report back if
> I hear anything.  I agree that it would be great if Nutch spoke RSS out
> of the box.
>> I believe that Nutch community is uniquely positioned to propose and
>> promote an open, unencumbered XML API for search results syndication.
>> Let's have a discussion about this - I already implemented a REST
>> interface, which I could clean up and contribute, there were other
>> people on the list who planned to implement the SOAP interface.
> Do you think there is a need for a non-RSS REST interface?
> Doug

Chris A. Mattmann
Staff Member
Modeling and Data Management Systems Section (387)
Data Management Systems and Technologies Group
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                        Mailstop:  171-246
Phone:  818-354-8810
Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.

View raw message