lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shamik Bandopadhyay <sham...@gmail.com>
Subject Uneven data distribution with composite router
Date Wed, 25 Mar 2015 18:51:17 GMT
Hi,

   I'm using a three level composite router in a solr cloud environment,
primarily for multi-tenant and field collapsing. The format is as follows.

*language!topic!url*.

An example would be :

ENU!12345!www.testurl.com/enu/doc1
GER!12345!www.testurl.com/ger/doc2
CHS!67890!www.testurl.com/chs/doc3

The Solr Cloud cluster contains 2 shard, each having 3 replicas. After
indexing around 10 million documents, I'm observing that the index size in
shard 1 is around 60gb while shard 2 is 15gb. So the bulk of the data is
getting indexed in shard 1. Since 60% of the document is english, I expect
the index size to be higher on one shard, but the difference seem little
too high.

The idea is to make sure that all ENU!12345 documents are routed to one
shard so that distributed field collapsing works. Is there something I can
do differently here to make a better distribution ?

Any pointers will be appreciated.

Regards,
Shamik

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message