lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: indexing/crawling HTML + solr
Date Wed, 03 Jun 2009 11:23:53 GMT

Gena,

Besides droids (simpler, smaller components you can put together) there is also Nutch, a bigger
beast for large scale crawling that index crawled pages into Solr - http://lucene.apache.org/nutch
.

Otis


----- Original Message ----
> From: Gena Batsyan <gbatyan@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Wednesday, June 3, 2009 6:09:36 AM
> Subject: indexing/crawling HTML + solr
> 
> Hi!
> 
> to be short, where to start with the subject?
> 
> Any pointers to some [semi-]functional solutions that crawl the web as a normal 
> crawler, take care about html parsing, etc, and feed the crawled stuff as 
> solr-documents per   ?
> 
> regards!


Mime
View raw message