lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marc Sturlese <marc.sturl...@gmail.com>
Subject Re: anyone use hadoop+solr?
Date Tue, 22 Jun 2010 16:43:27 GMT

Well, the patch consumes the data from a csv. You have to modify the input to
use TableInputFormat (I don't remember if it's called exaclty like that) and
it will work.
Once you've done that, you have to specify as much reducers as shards you
want.

I know 2 ways to index using hadoop
method 1 (solr-1301 & nutch):
-Map: just get data from the source and create key-value
-Reduce: does the analysis and index the data
So, the index is build on the reducer side

method 2 (hadoop lucene index contrib)
-Map: does analysis and open indexWriter to add docs
-Reducer: Merge small indexs build in the map
So, indexs are build on the map side
method 2 has no good integration with Solr at the moment.

In the jira (SOLR-1301) there's a good explanation of the advantages and
disadvantages of indexing on the map or reduce side. I recomend you to read
with detail all the comments on the jira to know exactly how it works.


-- 
View this message in context: http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914625.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message