lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: shard splitting (solr 4.4.0)
Date Wed, 01 Apr 2015 15:25:06 GMT
Ashwin:

First, if at all possible I would simply set up my new SolrCloud
structure (2 shards, a leader and follower each) and re-index the
entire corpus. 24M docs isn't really very many, and you'll have to
have this capability sometime since somone, somewhere will want to
change the schema in ways that require it.

But to answer your questions:
1: Certainly. There's the SPLITSHARD command, see:
https://cwiki.apache.org/confluence/display/solr/Collections+API. That
said, Solr 4.4 used a relatively early version of SPLITSHARD and there
have been many improvements so make sure and back up first.

2: Not quite sure how long it takes, but I wouldn't expect it to take
hours. A lot depends on what the docs are like.

3: Yes, sending a query (or update for that matter) to any node in the
cluster will "do the right thing". In a production environment, and
assuming you're not using SolrJ, I'd put a load balancer in front of
the cluster for queries. If you _are_ querying through SolrJ from the
application, you only need to use the CloudSolrServer class as it
includes a software load balancer by default. Otherwise, if you
hard-code a single machine that machine becomes a single point of
failure.

Best,
Erick

On Wed, Apr 1, 2015 at 4:55 AM, Ashwin Kumar <ashwinsolr@outlook.de> wrote:
>  Hello Solr Community,
>
> Greetings ! This is my first post to this group.
>
> I am very new to solr, so please do not mind if some of my questions below sound dumb
:)
>
> Let me explain my present setup:
>
> Solr version : Solr_4.4.0
> Zookeeper version: zookeeper-3.4.5
> -----------------------------
>
> Present Setup
> Unix_box_1
> One Solr instance (Collection 1 : contains around 24 million indexed documents) running
on port 8983
>
> --------------------------------------------
>
> Target setup
>
> Now as the number of users are going to increase and also we are looking for high availability,
I am thinking of setting up solr cloud with the following setup:
>
> Unix box 1
> zookeeper 1        (master)
> Solr instance 1    (Shard 1 - leader node)
> --------
>
> Unix_box_2
> zookeeper 2
> Solr instance 2  (Shard 2)
> --------
>
> Unix_box_3
> zookeeper 3
> Solr instance 3  (Replica for Shard 1)
> --------
>
> Unix_box_4
> Solr instance 4 (Replica for Shard 2)
> --------
>
> ========================================================================================
>
> Now following are my queries:
>
> 1) Is it possible for me to split the present solr running on one node with 24 million
docs under Collection1 into 2 shards as shown above ?
> 2) If yes how can I achieve this, and approximately how long does it take ?
> 3) For my application to fetch the result from solr, I need to give one solr url meaning
http://Unix_box_1:8983/solr   . In this case if I have some docs on shard2 (which is on Unix_box_2)
and some on shard1 (Unix_box_1), will my search result in the application fetch docs from
both the shards and combine the result ?
>
> =========================================================================================
>
>
> Thank you for your patience and time.
>
> Regards,
> Ashwin
>

Mime
View raw message