lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Upayavira" ...@odoko.co.uk>
Subject Re: Distributed Indexing
Date Sat, 29 Jan 2011 23:56:58 GMT
Lance,

Firstly, we're proposing a ShardDistributionPolicy interface for which
there is a default (mod of the doc ID) but other implementations are
possible. Another easy implementation would be a randomised or round
robin one.

As to threading, the first task would be to put all of the source
documents into "buckets", one bucket per shard, using the above
ShardDistributionPolicy to assign documents to buckets/shards. Then all
of the documents in a "bucket" could be sent to the relevant shard for
indexing (which would be nothing more than a normal HTTP post (or solrj
call?)).

As to whether this would be single threaded or multithreaded, I would
guess we would aim to do it the same as the distributed search code
(which I have not yet reviewed). However, it could presumably be
single-threaded, but use asynchronous HTTP.

Regards, Upayavira

On Sat, 29 Jan 2011 15:09 -0800, "Lance Norskog" <goksron@gmail.com>
wrote:
> I would suggest that a DistributedRequestUpdateHandler run
> single-threaded, doing only one document at a time. If I want more
> than one, I run it twice or N times with my own program.
> 
> Also, this should have a policy object which decides exactly how
> documents are distributed. There are different techniques for
> different use cases.
> 
> Lance
> 
> On Sat, Jan 29, 2011 at 12:34 PM, Soheb Mahmood <soheb.lucene@gmail.com>
> wrote:
> > Hello Yonik,
> >
> > On Thu, 2011-01-27 at 08:01 -0500, Yonik Seeley wrote:
> >> Making it easy for clients I think is key... one should be able to
> >> update any node in the solr cluster and have solr take care of the
> >> hard part about updating all relevant shards.  This will most likely
> >> involve an update processor.  This approach allows all existing update
> >> methods (including things like CSV file upload) to still work
> >> correctly.
> >>
> >> Also post.jar is really just for testing... a command-line replacement
> >> for "curl" for those who may not have it.  It's not really a
> >> recommended way for updating Solr servers in production.
> >
> > OK, I've abandoned the post.jar tool idea in favour of a
> > DistributedUpdateRequestProcessor class (I've been looking into other
> > classes like UpdateRequestProcessor, RunUpdateRequestProcessor,
> > SignatureUpdateProcessorFactory, and SolrQueryRequest to see how they
> > are used/what data they store - hence why I've taken some time to
> > respond).
> >
> > My big question now is that is it necessary to have a Factory class for
> > DistributedUpdateRequestProcessor? I've seen this lots of times, as in
> > RunUpdateProcessorFactory (where the factory class was only a few lines
> > of code) to SignatureUpdateProcessorFactory? At first I was thinking it
> > would be a good design idea to include it in (in a generic sense), but
> > then I thought harder and I thought that the
> > DistributedUpdateRequestHander would only be running once, taking in all
> > the requests, so it seems sort of pointless to write one in.
> >
> > That is my "burning" question for now. I have got a few more questions,
> > but I'm sure that when I look further into the code, I'll either have
> > more or all of my questions are answered.
> >
> > Many thanks!
> >
> > Soheb Mahmood
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: dev-help@lucene.apache.org
> >
> >
> 
> 
> 
> -- 
> Lance Norskog
> goksron@gmail.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
> 
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message