lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amrit Sarkar <sarkaramr...@gmail.com>
Subject Re: Solr Document Routing
Date Thu, 01 Jun 2017 08:41:47 GMT
Sathyam,

It seems your interpretation is wrong as CloudSolrClient calculates (hashes
the document id and determine the range it belongs to) which shard the
document incoming belongs to. As you have 10 shards, the document will
belong to one of them, that is what being calculated and eventually pushed
to the leader of that shard.

The confluence link provides the insights in much detail:
https://lucidworks.com/2013/06/13/solr-cloud-document-routing/
Another useful link:
https://lucidworks.com/2013/06/13/solr-cloud-document-routing/

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Thu, Jun 1, 2017 at 11:52 AM, Sathyam <sathyam.doraswamy@gmail.com>
wrote:

> HI,
>
> I am indexing documents to a 10 shard collection (testcollection, having no
> replicas) in solr6 cluster using CloudSolrClient. I saw that there is a lot
> of peer to peer document distribution going on when I looked at the solr
> logs.
>
> An example log statement is as follows:
> 2017-06-01 06:07:28.378 INFO  (qtp1358444045-3673692) [c:testcollection
> s:shard8 r:core_node7 x:testcollection_shard8_replica1]
> o.a.s.u.p.LogUpdateProcessorFactory [testcollection_shard8_replica1]
>  webapp=/solr path=/update params={update.distrib=TOLEADER&distrib.from=
> http://10.199.42.29:8983/solr/testcollection_shard7_
> replica1/&wt=javabin&version=2}{add=[BQECDwZGTCEBHZZBBiIP
> (1568981383488995328), BQEBBQZB2il3wGT/0/mB (1568981383490043904),
> BQEBBQZFnhOJRj+m9RJC (1568981383491092480), BQEGBgZIeBE1klHS4fxk
> (1568981383492141056), BQEBBQZFVTmRx2VuCgfV (1568981383493189632)]} 0 25
>
> When I went through the code of CloudSolrClient on grepcode I saw that the
> client itself finds out which server it needs to hit by using the message
> id hash and getting the shard range information from state.json.
> Then it is quite confusing to me why there is a distribution of data
> between peers as there is no replication and each shard is a leader.
>
> I would like to know why this is happening and how to avoid it or if the
> above log statement means something else and I am misinterpreting
> something.
>
> --
> Sathyam Doraswamy
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message