lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Furkan KAMACI <furkankam...@gmail.com>
Subject Re: SolrCloud High Availability during indexing operation
Date Wed, 09 Oct 2013 08:15:45 GMT
Hi Saurabh,
Your link does not work (it is broken).


2013/10/9 Saurabh Saxena <ssaxena@gopivotal.com>

> Pastbin link http://pastebin.com/cnkXhz7A
>
> I am doing a bulk request. I am uploading 100 files, each file having 100
> docs.
>
> -Saurabh
>
>
> On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller <markrmiller@gmail.com> wrote:
>
> > The attachment did not go through - try using pastebin.com or something.
> >
> > Are you adding docs with curl one at a time or in bulk per request.
> >
> > - Mark
> >
> > On Oct 8, 2013, at 9:58 PM, Saurabh Saxena <ssaxena@gopivotal.com>
> wrote:
> >
> > > Repeated the experiments on local system. Single shard Solrcloud with a
> > replica. Tried to index 10K docs. All the indexing operation were
> > redirected to replica Solr node. While the document while getting indexed
> > on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900
> > docs got indexed. If I repeat the experiment without shutting down the
> > leader instance, all 10K docs get indexed. I am using curl to upload the
> > docs, there was no curl error while uploading documents.
> > >
> > > Following error was there in replica log file.
> > >
> > > ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException;
> > org.apache.solr.common.SolrException: No registered leader was found,
> > collection:test_collection slice:shard1
> > >
> > > Attached replica log file.
> > >
> > >
> > > On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena <ssaxena@gopivotal.com
> >
> > wrote:
> > > Sorry for the late reply.
> > >
> > > All the documents have unique id. If I repeat the experiment, the num
> of
> > docs indexed changes (I guess it depends when I shutdown a particular
> > shard). When I do the experiment without shutting down leader Shards, all
> > 80k docs get indexed (which I think proves that all documents are valid).
> > >
> > > I need to dig the logs to find error message. Also, I am not tracking
> of
> > curl return code, will run again and reply.
> > >
> > > Regards,
> > > Saurabh
> > >
> > >
> > > On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson <
> erickerickson@gmail.com>
> > wrote:
> > > And do any of the documents have the same <uniqueKey>, which
> > > is usually called "id"? Subsequent adds of docs with the same
> > > <uniqueKey> replace the earlier one.
> > >
> > > It's not definitive because it changes as merges happen, old copies
> > > of docs that have been deleted or updated will be purged, but what
> > > does your admin page show for "maxDoc"? If it's more than "numDocs"
> > > then you have duplicate <uniqueKey>s. NOTE: if you optimize
> > > (which you usually shouldn't) then maxDoc and numDocs will be
> > > the same so if you test this don't optimize.
> > >
> > > Best,
> > > Erick
> > >
> > >
> > > On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
> > > <wunder@wunderwood.org> wrote:
> > > > Did all of the curl update commands return success? Ane errors in the
> > logs?
> > > >
> > > > wunder
> > > >
> > > > On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:
> > > >
> > > >> Is it possible that some of those 80K docs were simply not valid?
> e.g.
> > > >> had a wrong field, had a missing required field, anything like that?
> > > >> What happens if you clear this collection and just re-run the same
> > > >> indexing process and do everything else the same?  Still some docs
> > > >> missing?  Same number?
> > > >>
> > > >> And what if you take 1 document that you know is valid and index it
> > > >> 80K times, with a different ID, of course?  Do you see 80K docs in
> the
> > > >> end?
> > > >>
> > > >> Otis
> > > >> --
> > > >> Solr & ElasticSearch Support -- http://sematext.com/
> > > >> Performance Monitoring -- http://sematext.com/spm
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena <
> > ssaxena@gopivotal.com> wrote:
> > > >>> Doc count did not change after I restarted the nodes. I am doing
a
> > single
> > > >>> commit after all 80k docs. Using Solr 4.4.
> > > >>>
> > > >>> Regards,
> > > >>> Saurabh
> > > >>>
> > > >>>
> > > >>> On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic <
> > > >>> otis.gospodnetic@gmail.com> wrote:
> > > >>>
> > > >>>> Interesting. Did the doc count change after you started the
nodes
> > again?
> > > >>>> Can you tell us about commits?
> > > >>>> Which version? 4.5 will be out soon.
> > > >>>>
> > > >>>> Otis
> > > >>>> Solr & ElasticSearch Support
> > > >>>> http://sematext.com/
> > > >>>> On Sep 23, 2013 8:37 PM, "Saurabh Saxena" <ssaxena@gopivotal.com>
> > wrote:
> > > >>>>
> > > >>>>> Hello,
> > > >>>>>
> > > >>>>> I am testing High Availability feature of SolrCloud. I
am using
> the
> > > >>>>> following setup
> > > >>>>>
> > > >>>>> - 8 linux hosts
> > > >>>>> - 8 Shards
> > > >>>>> - 1 leader, 1 replica / host
> > > >>>>> - Using Curl for update operation
> > > >>>>>
> > > >>>>> I tried to index 80K documents on replicas (10K/replica
in
> > parallel).
> > > >>>>> During indexing process, I stopped 4 Leader nodes. Once
indexing
> > is done,
> > > >>>>> out of 80K docs only 79808 docs are indexed.
> > > >>>>>
> > > >>>>> Is this an expected behaviour ? In my opinion replica
should take
> > care of
> > > >>>>> indexing if leader is down.
> > > >>>>>
> > > >>>>> If this is an expected behaviour, any steps that can be
taken
> from
> > the
> > > >>>>> client side to avoid such a situation.
> > > >>>>>
> > > >>>>> Regards,
> > > >>>>> Saurabh Saxena
> > > >>>>>
> > > >>>>
> > > >
> > > > --
> > > > Walter Underwood
> > > > wunder@wunderwood.org
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message