nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Dlugolinsky <s.dlugolin...@gmail.com>
Subject Re: Plugins: when to perform web service requests, on fetch or on index?
Date Thu, 18 Jun 2009 12:56:04 GMT
Hello,

I don't know how v 1.0 differs from v 0.9, but in v 0.9, I would do
those service requests in the stage of indexation (extension point
IndexingFilter), where you have several data prepared from previous
stage (by parsers, etc.), so you can use this data in the requests.
But it depends on what you exactly want, whether you want to use
parsed data in the requests. If not, you can call webservice requests
earlier from parsing stage (extension point Parse).

Here is something about core extension points:
http://wiki.apache.org/nutch/AboutPlugins

Steve

2009/6/18 caezar <caezaris@gmail.com>:
>
> Hi All,
>
> I'm writing several nutch plugins, which will perform a requests to some
> webservices for pages being indexed and store retrieved data in index. The
> question is: on what stage of crawling it is better to perform these
> webservice requests: on fetching or on indexing (in HtmlParseFilter or in
> IndexingFilter), in terms of performance, of course?
>
> Nutch version is 1.0, indexer is SolrIndexer.
>
> Thanks.
> --
> View this message in context: http://www.nabble.com/Plugins%3A-when-to-perform-web-service-requests%2C-on-fetch-or-on-index--tp24089858p24089858.html
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>

Mime
View raw message