lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saurabh Saxena <ssax...@gopivotal.com>
Subject Re: SolrCloud High Availability during indexing operation
Date Wed, 09 Oct 2013 01:58:47 GMT
Repeated the experiments on local system. Single shard Solrcloud with a
replica. Tried to index 10K docs. All the indexing operation were
redirected to replica Solr node. While the document while getting indexed
on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900
docs got indexed. If I repeat the experiment without shutting down the
leader instance, all 10K docs get indexed. I am using curl to upload the
docs, there was no curl error while uploading documents.

Following error was there in replica log file.

ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: No registered leader was found,
collection:test_collection slice:shard1

Attached replica log file.


On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena <ssaxena@gopivotal.com>wrote:

> Sorry for the late reply.
>
> All the documents have unique id. If I repeat the experiment, the num of
> docs indexed changes (I guess it depends when I shutdown a particular
> shard). When I do the experiment without shutting down leader Shards, all
> 80k docs get indexed (which I think proves that all documents are valid).
>
> I need to dig the logs to find error message. Also, I am not tracking of
> curl return code, will run again and reply.
>
> Regards,
> Saurabh
>
>
> On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson <erickerickson@gmail.com>wrote:
>
>> And do any of the documents have the same <uniqueKey>, which
>> is usually called "id"? Subsequent adds of docs with the same
>> <uniqueKey> replace the earlier one.
>>
>> It's not definitive because it changes as merges happen, old copies
>> of docs that have been deleted or updated will be purged, but what
>> does your admin page show for "maxDoc"? If it's more than "numDocs"
>> then you have duplicate <uniqueKey>s. NOTE: if you optimize
>> (which you usually shouldn't) then maxDoc and numDocs will be
>> the same so if you test this don't optimize.
>>
>> Best,
>> Erick
>>
>>
>> On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood
>> <wunder@wunderwood.org> wrote:
>> > Did all of the curl update commands return success? Ane errors in the
>> logs?
>> >
>> > wunder
>> >
>> > On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote:
>> >
>> >> Is it possible that some of those 80K docs were simply not valid? e.g.
>> >> had a wrong field, had a missing required field, anything like that?
>> >> What happens if you clear this collection and just re-run the same
>> >> indexing process and do everything else the same?  Still some docs
>> >> missing?  Same number?
>> >>
>> >> And what if you take 1 document that you know is valid and index it
>> >> 80K times, with a different ID, of course?  Do you see 80K docs in the
>> >> end?
>> >>
>> >> Otis
>> >> --
>> >> Solr & ElasticSearch Support -- http://sematext.com/
>> >> Performance Monitoring -- http://sematext.com/spm
>> >>
>> >>
>> >>
>> >> On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena <ssaxena@gopivotal.com>
>> wrote:
>> >>> Doc count did not change after I restarted the nodes. I am doing a
>> single
>> >>> commit after all 80k docs. Using Solr 4.4.
>> >>>
>> >>> Regards,
>> >>> Saurabh
>> >>>
>> >>>
>> >>> On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic <
>> >>> otis.gospodnetic@gmail.com> wrote:
>> >>>
>> >>>> Interesting. Did the doc count change after you started the nodes
>> again?
>> >>>> Can you tell us about commits?
>> >>>> Which version? 4.5 will be out soon.
>> >>>>
>> >>>> Otis
>> >>>> Solr & ElasticSearch Support
>> >>>> http://sematext.com/
>> >>>> On Sep 23, 2013 8:37 PM, "Saurabh Saxena" <ssaxena@gopivotal.com>
>> wrote:
>> >>>>
>> >>>>> Hello,
>> >>>>>
>> >>>>> I am testing High Availability feature of SolrCloud. I am using
the
>> >>>>> following setup
>> >>>>>
>> >>>>> - 8 linux hosts
>> >>>>> - 8 Shards
>> >>>>> - 1 leader, 1 replica / host
>> >>>>> - Using Curl for update operation
>> >>>>>
>> >>>>> I tried to index 80K documents on replicas (10K/replica in
>> parallel).
>> >>>>> During indexing process, I stopped 4 Leader nodes. Once indexing
is
>> done,
>> >>>>> out of 80K docs only 79808 docs are indexed.
>> >>>>>
>> >>>>> Is this an expected behaviour ? In my opinion replica should
take
>> care of
>> >>>>> indexing if leader is down.
>> >>>>>
>> >>>>> If this is an expected behaviour, any steps that can be taken
from
>> the
>> >>>>> client side to avoid such a situation.
>> >>>>>
>> >>>>> Regards,
>> >>>>> Saurabh Saxena
>> >>>>>
>> >>>>
>> >
>> > --
>> > Walter Underwood
>> > wunder@wunderwood.org
>> >
>> >
>> >
>>
>
>

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message