lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomas Fernandez Lobbe <tflo...@apple.com>
Subject Re: Request routing / load-balancing TLOG & PULL replica types
Date Mon, 12 Feb 2018 22:13:29 GMT


> On Feb 12, 2018, at 12:06 PM, Greg Roodt <groodt@gmail.com> wrote:
> 
> Thanks Ere. I've taken a look at the discussion here:
> http://lucene.472066.n3.nabble.com/Limit-search-queries-only-to-pull-replicas-td4367323.html
> This is how I was imagining TLOG & PULL replicas would wor, so if this
> functionality does get developed, it would be useful to me.
> 
> I still have 2 questions at the moment:
> 1. I am running the single shard scenario. I'm thinking of using a
> dedicated HTTP load-balancer in front of the PULL replicas only with
> read-only queries directed directly at the load-balancer. In this
> situation, the healthy PULL replicas *should* handle the queries on the
> node itself without a proxy hop (assuming state=active). New PULL replicas
> added to the load-balancer will internally proxy queries to the other PULL
> or TLOG replicas while in state=recovering until the switch to
> state=active. Is my understanding correct?

Yes

> 
> 2. Is it all worth it? Is there any advantage to running a cluster of 3
> TLOGs + 10 PULL replicas vs running 13 TLOG replicas?
> 

I don’t have a definitive answer, this will depend on your specific use case. As Erick said,
there is very little work that non-leader TLOG replicas do for each update, and having all
TLOG replicas means that with a single active replica you could in theory handle updates.
It’s sometimes nice to separate query traffic from update traffic, but this can still be
done if you have all TLOG replicas and you just make sure you don’t query the leader…
One nice characteristic that PULL replicas have is that they can’t go into Leader Initiated
Recovery (LIR) state, even if there is some sort of network partition, they’ll remain in
active state even if they can’t talk with the leader as long as they can reach ZooKeeper
(note that this means they may be responding with outdated data for an undetermined amount
of time, until replicas can replicate from the leader again). Also, since updates are not
sent to all the replicas (only the TLOG replicas), updates should be faster with 3 TLOG vs
13 TLOG replicas.


Tomás

> 
> 
> 
> On 12 February 2018 at 19:25, Ere Maijala <ere.maijala@helsinki.fi> wrote:
> 
>> Your question about directing queries to PULL replicas only has been
>> discussed on the list. Look for topic "Limit search queries only to pull
>> replicas". What I'd like to see is something similar to the
>> preferLocalShards parameter. It could be something like
>> "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that
>> SOLR-10880 could be used as a base for such funtionality, and I'm
>> considering taking a stab at implementing it.
>> 
>> --Ere
>> 
>> 
>> Greg Roodt kirjoitti 12.2.2018 klo 6.55:
>> 
>>> Thank you both for your very detailed answers.
>>> 
>>> This is great to know. I knew that SolrJ had the cluster aware knowledge
>>> (via zookeeper), but I was wondering what something like curl would do.
>>> Great to know that internally the cluster will proxy queries to the
>>> appropriate place regardless.
>>> 
>>> I am running the single shard scenario. I'm thinking of using a dedicated
>>> HTTP load-balancer in front of the PULL replicas only with read-only
>>> queries directed directly at the load-balancer. In this situation, the
>>> healthy PULL replicas *should* handle the queries on the node itself
>>> without a proxy hop (assuming state=active). New PULL replicas added to
>>> the
>>> load-balancer will internally proxy queries to the other PULL or TLOG
>>> replicas while in state=recovering until the switch to state=active.
>>> 
>>> Is my understanding correct?
>>> 
>>> Is this sensible to do, or is it not worth it due to the smart proxying
>>> that SolrCloud can do anyway?
>>> 
>>> If the TLOG and PULL replicas are so similar, is there any real advantage
>>> to having a mixed cluster? I assume a bit less work is required across the
>>> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
>>> nodes? Or would it be better to just have 13 TLOG nodes?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <tflobbe@apple.com>
>>> wrote:
>>> 
>>> On the last question:
>>>> For Writes: Yes. Writes are going to be sent to the shard leader, and
>>>> since PULL replicas can’t  be leaders, it’s going to be a TLOG replica.
>>>> If
>>>> you are using CloudSolrClient, then this routing will be done directly
>>>> from
>>>> the client (since it will send the update to the leader), and if you are
>>>> using some other HTTP client, then yes, the PULL replica will forward the
>>>> update, the same way any non-leader node would.
>>>> 
>>>> For reads: this won’t happen today, and any replica can respond to
>>>> queries. I do believe there is value in this kind of routing logic,
>>>> sometimes you simply don’t want the leader to handle any queries,
>>>> specially
>>>> when queries can be expensive. You could do this today if you want, by
>>>> putting some load balancer in front and just direct your queries to the
>>>> nodes you know are PULL, but keep in mind that this would only work in
>>>> the
>>>> single shard scenario, and only if you hit an active replica (otherwise,
>>>> as
>>>> you said, the query will be routed to any other node of the shard,
>>>> regardless of the type), if you have multiple shards then you need to use
>>>> the “shards” parameter and tell Solr exactly which nodes you want to
hit
>>>> for each shard (the “shards” approach can also be done in the single
>>>> shard
>>>> case, although you would be adding an extra hop I believe)
>>>> 
>>>> Tomás
>>>> Sent from my iPhone
>>>> 
>>>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <groodt@gmail.com> wrote:
>>>>> 
>>>>> Hi
>>>>> 
>>>>> I have a question around how queries are routed and load-balanced in
a
>>>>> cluster of mixed TLOG and PULL replicas.
>>>>> 
>>>>> I thought that I might have to put a load-balancer in front of the PULL
>>>>> replicas and direct queries at them manually as nodes are added and
>>>>> 
>>>> removed
>>>> 
>>>>> as PULL replicas. However, it seems that SolrCloud handles this
>>>>> automatically?
>>>>> 
>>>>> If I add a new PULL replica node, it goes into state="recovering" while
>>>>> 
>>>> it
>>>> 
>>>>> pulls the core. As expected. What happens if queries are directed at
>>>>> this
>>>>> node while in this state? From what I am observing, the query gets
>>>>> 
>>>> directed
>>>> 
>>>>> to another node?
>>>>> 
>>>>> If SolrCloud is handling the routing of requests to active nodes, will
>>>>> it
>>>>> automatically favour PULL replicas for read queries and TLOG replicas
>>>>> for
>>>>> writes?
>>>>> 
>>>>> Thanks
>>>>> Greg
>>>>> 
>>>> 
>>>> 
>>> 
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland
>> 


Mime
View raw message