lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Marx <>
Subject Re: Indexing distant web sites
Date Mon, 04 Nov 2002 11:31:50 GMT
As stated in the official FAQ Lucene doesn't implement a web-crawler, 
you can however use a self-made crawler or customate a crawler 
framework like websphinx ( to 
retrieve html documents from a site and then feed them to Lucene.

mvh karl øie

On Monday, Nov 4, 2002, at 11:49 Europe/Oslo, Friaa Nafaa wrote:

>  Hello,is there any way to index web sites by lucene, assuming we know 
> only the url of the site ? :--&gt;In local use we passe to lucene the 
> full arborexcence or directory of our site (contain all the documents) 
> and we begin the indexing operation, but when I would like to index a 
> distant site on the web... what i do ?For exemple I installed Lucene 
> on my computer and I would like to index the site : 
> ...Thanks
> _______________________________________________
> Join Excite! -
> The most personalized portal on the Web!

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message