lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uri Boness <ubon...@gmail.com>
Subject Re: solr nutch url indexing
Date Mon, 24 Aug 2009 21:46:18 GMT
Hi,

Nutch comes with support for Solr out of the box. I suggest you follow 
the steps as described here: 
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/

Cheers,
Uri

Fuad Efendi wrote:
> Is SolrIndex plugin for Nutch? 
> Thanks!
>
>
> -----Original Message-----
> From: Uri Boness [mailto:uboness@gmail.com] 
> Sent: August-24-09 4:42 PM
> To: solr-user@lucene.apache.org
> Subject: Re: solr nutch url indexing
>
> How did you configure nutch?
>
> Make sure you have the "parse-html" and "index-basic" configured. The 
> HtmlParser should by default extract the page title and add to the 
> parsed data, and the BasicIndexingFilter by default adds this title to 
> the NutchDocument and stores it in the "title" filed. All the SolrIndex 
> (actually the SolrWriter) does is converting the NuchDocument to a 
> SolrInputDocument. So having these plugins configured in Nutch and 
> having a field in the schema named "title" should work. (I'm assuming 
> you're using the "solrindex" tool)
>
> Cheers,
> Uri
>
> Lassalle, Thibaut wrote:
>   
>> Hi,
>>
>>  
>>
>> I would like to crawl intranets with nutch and index them with solr.
>>
>>  
>>
>> I would like to search mostly on the title of the pages (the one in
>> <title>This is a title</title>)
>>
>>  
>>
>> I tried to tweak the schema.xml to do that but nothing is working. I
>> just have the content indexed.
>>
>>  
>>
>> How do I index on title ?
>>
>>  
>>
>> Thanks
>>
>> t.
>>
>>
>>   
>>     
>
>
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message