lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomás Fernández Löbbe <tomasflo...@gmail.com>
Subject Re: SolrCloud: general questions
Date Fri, 02 Nov 2012 10:52:13 GMT
>
> My questions are:
> 1) Is it true, that I may send data to any of shards [9080, 9081, 9082,
> 9083] and don't care about how SolrCloud will distribute data between
> shards? What algorithm is used: round robin?
>
It is true, the document is forwarded to the correct shard automatically.
It's not round robin, it's a hash function applied to the unique key of the
document.


>
> 2) For example, in ColrCloud there is a document:
> <doc><field name="id">1</field><field name="name">this is Solr
> 3.5</field></doc>
> I have no information about shard in which this doc is. I need to update
> information at field "name". The new doc is:
> <doc><field name="id">1</field><field name="name">this is
> SolrCloud</field></doc>
> Is it true, that I may send this doc to any of shards [9080, 9081, 9082,
> 9083] and after commit, when I run the query, I'll have "this is SolrCloud
> "
> instead of "this is Solr 3.5" in results? As I see old data is still at
> index until optimize done?
>
You'll only see the updated document, yes, as the hash function will give
the same result on the "id" field and it will go to the same shard as
before, there the document will be "updated" (deleted the old one and
inserted the new one). The old document will remain on the index (not
visible, as you said) until the segment where it is located is merged, this
can be due to optimize or background segment merging.


>
> 3) Is it true, that delete by query works regardless of where to send the
> request?
>
yes.

>
> 4) My DnumShards=4. If I need to expand SolrCloud, for example, to 6
> shards,
> I need to remove Zookeeper data directory, set DnumShards to 6 and restart
> Jetty. Can I set DnumShards=20 and only add new shards in a future with out
> any removal and restart JVM?
>
I think you could remove the collection and create it again. See the new
collections API. You need to have at least as many Solr instances (or Solr
cores) as the number of shards in order to be able to anything with your
collection. You won't be able to index of search if the number of nodes <
number of shards. Any change in the number of shards requires re indexing
everything.


>
> 5) Currently we have 30 shards with 50M docs. What schema you advice:
> shards
> with ~15M docs, or more shards with less count of docs? What will be
> faster:
> search on shards with ~15M docs or search on more shards with less count of
> docs? Expected count of docs are ~1 500 000 000.
>

I think you'll have to test it, as it will depend much on your context (the
shape of your docs/index, your queries and other use cases), shards with
15M docs doesn't sound crazy, but I never tested with 100 shards really.

Tomás


>
> Thanks for your responses.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrCloud-general-questions-tp4017769.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message