lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sol myr <>
Subject Lucene Grid question
Date Tue, 13 Sep 2011 19:58:41 GMT

I have a huge Lucene index, which I'd like to split between machines ("Grid").

E.g. say I have a chain of book-stores, in different countries, and I'm aiming for the following:
- Each country has its own index file, on its own machine (e.g. books from Japan are indexed
on machine "japan1")
- Most users search only within their own country (e.g. search only the "japan1" index)
- But sometimes, they might ask to search the entire chain (all countries), meaning some sort
of "map/reduce" (=collect data from all countries).

The main challenge is the "entire chain search", especially if I want reasonable ranking.

After some investigation (+great help from Hibernate Search forum), I've seen the following

1) Implement a LuceneDirectory that transparently spreads across several machines.

I'm not sure how the Search would work - can I ask each index for *relevant* data only?
Or would I need to maintain one huge combined file, allowing "random access" for the Searcher?

2) Run an IndexReader on each machine.

They tell me each reader can report its relevant term-frequencies, and based on that I can
fetch relevant results from each machine.
Apparently the ranking won't be perfect (for the overhaul result), but bearable.

Now, I'm not familiar with Lucene internals, and would really appreciate your views on it.
- Any good articles on Lucene "Gridding"?
- Any idea whether approach #1 makes any sense (IMHO it's not very sensible if I need to merge
everything to a single huge file).
- Any good implementations (of either approaches)? So far I found Hibernate Search 4, and

Thanks very much.

View raw message