nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-nu...@tropo.com>
Subject Re: [Nutch-dev] Re: Starting a non-profit organisation running Nutch with a thousand or more sponsored servers
Date Thu, 17 Mar 2005 23:06:57 GMT
Michael Wechner wrote:

> Stefan Groschupf wrote:
> 
>>> have you collected these offers somewhere?
>>
>>
>> Check the source-forge mail archive.
> 
> 
> 
> thanks, will do.
> 
> btw, is there an interface within Nutch, where a CMS (e.g. Apache Lenya) 
> can notify Nutch about content changes (or deletion of pages or renaming 
> of URLs)?


I'm new to Nutch and haven't tried this yet, but I've had the same kind 
of question.

I think the answer is: IWebDBWriter.addPageIfNotPresent()
or maybe just addPage() in your case, for updates.

http://nutch.sourceforge.net/docs/api/net/nutch/db/IWebDBWriter.html#addPageIfNotPresent(net.nutch.db.Page)

See WebDBInjector for sample usage, esp as now you have to create a Page 
to get this call to work right.



> 
> I guess this would make crawling "obsolete" to a certain point (at least 
> for pages
> being created by content management systems).
> 
> Thanks
> 
> Michi
> 
>>
>>
> 
> 


Mime
View raw message