lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Software Dev <static.void....@gmail.com>
Subject Re: SolrCloudServer questions
Date Sat, 01 Feb 2014 22:59:31 GMT
Out use case is we have 3 indexing machines pulling off a kafka queue and
they are all sending individual updates.


On Fri, Jan 31, 2014 at 12:54 PM, Mark Miller <markrmiller@gmail.com> wrote:

> Just make sure parallel updates is set to true.
>
> If you want to load even faster, you can use the bulk add methods, or if
> you need more fine grained responses, use the single add from multiple
> threads (though bulk add can also be done via multiple threads if you
> really want to try and push the max).
>
> - Mark
>
> http://about.me/markrmiller
>
> On Jan 31, 2014, at 3:50 PM, Software Dev <static.void.dev@gmail.com>
> wrote:
>
> > Which of any of these settings would be beneficial when bulk uploading?
> >
> >
> > On Fri, Jan 31, 2014 at 11:05 AM, Mark Miller <markrmiller@gmail.com>
> wrote:
> >
> >>
> >>
> >> On Jan 31, 2014, at 1:56 PM, Greg Walters <greg.walters@answers.com>
> >> wrote:
> >>
> >>> I'm assuming you mean CloudSolrServer here. If I'm wrong please ignore
> >> my response.
> >>>
> >>>> -updatesToLeaders
> >>>
> >>> Only send documents to shard leaders while indexing. This saves
> >> cross-talk between slaves and leaders which results in more efficient
> >> document routing.
> >>
> >> Right, but recently this has less of an affect because CloudSolrServer
> can
> >> now hash documents and directly send them to the right place. This
> option
> >> has become more historical. Just make sure you set the correct id field
> on
> >> the CloudSolrServer instance for this hashing to work (I think it
> defaults
> >> to "id").
> >>
> >>>
> >>>> shutdownLBHttpSolrServer
> >>>
> >>> CloudSolrServer uses a LBHttpSolrServer behind the scenes to distribute
> >> requests (that aren't updates directly to leaders). Where did you find
> >> this? I don't see this in the javadoc anywhere but it is a boolean in
> the
> >> CloudSolrServer class. It looks like when you create a new
> CloudSolrServer
> >> and pass it your own LBHttpSolrServer the boolean gets set to false and
> the
> >> CloudSolrServer won't shut down the LBHttpSolrServer when it gets shut
> down.
> >>>
> >>>> parellelUpdates
> >>>
> >>> The javadoc's done have any description for this one but I checked out
> >> the code for CloudSolrServer and if parallelUpdates it looks like it
> >> executes update statements to multiple shards at the same time.
> >>
> >> Right, we should def add some javadoc, but this sends updates to shards
> in
> >> parallel rather than with a single thread. Can really increase update
> >> speed. Still not as powerful as using CloudSolrServer from multiple
> >> threads, but a nice improvement non the less.
> >>
> >>
> >> - Mark
> >>
> >> http://about.me/markrmiller
> >>
> >>>
> >>> I'm no dev but I can read so please excuse any errors on my part.
> >>>
> >>> Thanks,
> >>> Greg
> >>>
> >>> On Jan 31, 2014, at 11:40 AM, Software Dev <static.void.dev@gmail.com>
> >> wrote:
> >>>
> >>>> Can someone clarify what the following options are:
> >>>>
> >>>> - updatesToLeaders
> >>>> - shutdownLBHttpSolrServer
> >>>> - parallelUpdates
> >>>>
> >>>> Also, I remember in older version of Solr there was an efficient
> format
> >>>> that was used between SolrJ and Solr that is more compact. Does this
> >> sill
> >>>> exist in the latest version of Solr? If so, is it the default?
> >>>>
> >>>> Thanks
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message