lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Patton <>
Subject Re[2]: Federated Search
Date Mon, 05 Mar 2007 20:07:09 GMT

Jack L wrote:
> This is very interesting discussion. I have a few question while
> reading Tim and Venkatesh's email:
> To Tim:
> 1. is there any reason you don't want to use HTTP? Since solr has
>    an HTTP interface already, I suppose using HTTP is the simplest
>    way to communicate the solr servers from the merger/search broker.
>    hadoop and ice would both require some additional work - this is
>    if you are using solr and not lucent directly.
> 2. "Do you broadcast to the slaves as to who owns a document?"
>    Do the searchers need to know who has what document?
> To Venkatesh:
> 1. I suppose solr is ok to handle 20 million document - I hope I'm
>    right because that's what I'm planning on doing :) Is it because
>    of storage capacity why you you choose to use multiple solr
>    servers?
> An open question: what's the best way to manage server addition?
> - If a hash value-based partitioning is used, re-indexing all
>   the document will be needed.
> - Otherwise, a database seems to be required to track the documents.


My big stumbling blocks were with indexing more so than searching.  I 
did put together an RMI based system to search multiple lucene servers. 
  And the searchers don't need to know where everything is.  However 
with indexing at some point something needs to know where to send the 
documents for updating or who to tell to delete a document, whether it 
is the server that does the processing or some sort of broker.   The 
processing machines could do the DB look up and talk to Solr over HTTP 
no problem and this is part of what I am considering doing.  However I 
have some extra code on the indexing machines to handle DB updates 
etc..., though I might find a way to move this elsewhere in the system 
so I can have pretty much a pure solr server with just a few custom 
items (like my own Similarity or QueryParser).

I suppose the DB could be moved to lucene from SQL in the future as well.

View raw message