lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fran├žois Schiettecatte <fschietteca...@gmail.com>
Subject Re: Solr and wikipedia for schools
Date Sun, 04 Sep 2011 14:31:50 GMT
I note that there is a full download option available, might be easier than crawling.

Fran├žois

On Sep 4, 2011, at 9:56 AM, Markus Jelsma wrote:

> Hi,
> 
> Solr is a search engine, not a crawler. You can use Apache Nutch to crawl your 
> site and have it indexed in Solr.
> 
> Cheers,
> 
>> Hi,
>> 
>> I am new to Solr/Lucene, and have some problems trying to figure out the
>> best way to perform indexing. I think I understand the general principles,
>> but have some trouble translating this to my specific goal, which is the
>> following:
>> 
>> I want to use SolR as a search engine based on general (English) keywords,
>> that has indexed Wikipedia for Schools
>> (http://www.soschildrensvillages.org.uk/charity-news/archive/2008/10/2008-
>> wikipedia-for-schools).
>> 
>> I initially thought that it would be sufficient to add the root document
>> (index.html) to Solr, after which everything would be automagically
>> indexed, but this does not seem to work. I have also tried to use
>> urldatasource in data-config.xml, but there I get a bit confused by the
>> settings.
>> 
>> Could anyone help me understand how I can achieve my goal?
>> 
>> Thanks
>> 
>> Kees


Mime
View raw message