lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Buttler, David" <buttl...@llnl.gov>
Subject cores shards and disks in SolrCloud
Date Thu, 15 Nov 2012 23:04:05 GMT
Hi,
I have a question about the optimal way to distribute solr indexes across a cloud.  I have
a small number of collections (less than 10).  And a small cluster (6 nodes), but each node
has several disks - 5 of which I am using for my solr indexes.  The cluster is also a hadoop
cluster, so the disks are not RAIDed, they are JBOD.  So, on my 5 slave nodes, each with 5
disks, I was thinking of putting one shard per collection.  This means I end up with 25 shards
per collection.  If I had 10 collections, that would make it 250 shards total.  Given that
Solr 4 supports multi-core, my first thought was to try one JVM for each node: for 10 collections
per node, that means that each JVM would contain 50 shards.

So, I set up my first collection, with a modest 20M documents, and everything seems to work
fine.  But, now my subsequent collections that I have added are having issues.  The first
one is that every time I query for the document count (*:* with rows=0), a different number
of documents is returned. The number can differ by as much as 10%.  Now if I query each shard
individually (setting distrib=false), the number returned is always consistent.

I am not entirely sure this is related as I may have missed a step in my setup of subsequent
collections (bootstrapping the config)

But, more related to the architecture question: is it better to have one JVM per disk, one
JVM per shard, or one JVM per node.  Given the MMap of the indexes, how does memory play into
the question?   There is a blog post (http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html)
that recommends minimizing the amount of JVM memory and maximizing the amount of OS-level
file cache, but how does that impact sorting / boosting?

Sorry if I have missed some documentation: I have been through the cloud tutorial a couple
of times, and I didn't see any discussion of these issues

Thanks,
Dave

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message