nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject OpenSearch API (Re: Nutch / CGI)
Date Wed, 30 Mar 2005 11:38:47 GMT

The CGI use case could be treated as a special case of integrating Nutch 
with another largely incompatible environment, in a loosely-coupled 
system. A popular way to do this would be to use an XML-based API from a 
CGI script.

This is yet another case that speaks in favor of adding an 
"out-of-the-box" XML API to Nutch. There are only a couple of ways to do 
it that make sense, IMHO:

* REST - HTTP GET or POST request, with query parameters contained in 
GET or POST parameters. An XML data document with results is a response. 
Lightweight, easy to implement and create, and relatively easy to 
consume. Lack of high-level API-s in most programming languages could be 
a problem, though.

* RSS - a special case of the above, where the response follows a 
standard schema. A big advantage to use this is its popularity and a 
large base of tools (libraries, readers, aggregators).

* SOAP - SOAP-encoded request and response. Well integrated into most 
programming languages, but certainly less efficient (consumes more 
bandwidth, CPU and memory to create and consume).

* XML-RPC - more lightweight than SOAP, but follows a similar RPC paradigm.

AFAIK, there is a specification called OpenSearch, an extension to RSS, 
created by Amazon/A9. However, I was unable to find the terms of use for 
that specification, so it might be encumbered. As I wrote above, using 
RSS gives strong advantages, so it would be nice to figure out if we can 
use it.

Existing API-s from other search engines are unfortunately encumbered by 
their restrictive terms of use, so it is dangerous to re-use them.

I believe that Nutch community is uniquely positioned to propose and 
promote an open, unencumbered XML API for search results syndication. 
Let's have a discussion about this - I already implemented a REST 
interface, which I could clean up and contribute, there were other 
people on the list who planned to implement the SOAP interface.

Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

View raw message