nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: OpenSearch API (Re: Nutch / CGI)
Date Wed, 30 Mar 2005 18:21:38 GMT
Andrzej Bialecki wrote:
> This is yet another case that speaks in favor of adding an 
> "out-of-the-box" XML API to Nutch.

Yes, I agree.

> * REST - HTTP GET or POST request, with query parameters contained in 
> GET or POST parameters. An XML data document with results is a response. 
> Lightweight, easy to implement and create, and relatively easy to 
> consume. Lack of high-level API-s in most programming languages could be 
> a problem, though.

In particular, I would love to see a REST contribution.  It should not 
require more than a simple servlet or jsp page that uses NutchBean. 
This logic should be much the same as the current search.jsp, but the 
output would be xml instead of html.  Also this would need to provide 
documentation of both the url parameters and the xml result schema.

Once this is implemented, search.jsp can be replaced with a filter that 
applies a stylesheet to XML search results.

> * RSS - a special case of the above, where the response follows a 
> standard schema. A big advantage to use this is its popularity and a 
> large base of tools (libraries, readers, aggregators).

This would also be very useful.  This could even be the primary API.  We 
can use namespaces to provide, e.g., non-standard item elements.

> * SOAP - SOAP-encoded request and response. Well integrated into most 
> programming languages, but certainly less efficient (consumes more 
> bandwidth, CPU and memory to create and consume).
> * XML-RPC - more lightweight than SOAP, but follows a similar RPC paradigm.

These are a lower priority for me, but such contributions would be welcome.

> AFAIK, there is a specification called OpenSearch, an extension to RSS, 
> created by Amazon/A9. However, I was unable to find the terms of use for 
> that specification, so it might be encumbered. As I wrote above, using 
> RSS gives strong advantages, so it would be nice to figure out if we can 
> use it.

I have written to folks at A9 asking about this.  I will report back if 
I hear anything.  I agree that it would be great if Nutch spoke RSS out 
of the box.

> I believe that Nutch community is uniquely positioned to propose and 
> promote an open, unencumbered XML API for search results syndication. 
> Let's have a discussion about this - I already implemented a REST 
> interface, which I could clean up and contribute, there were other 
> people on the list who planned to implement the SOAP interface.

Do you think there is a need for a non-RSS REST interface?


View raw message