lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ere Maijala <ere.maij...@helsinki.fi>
Subject Re: Request routing / load-balancing TLOG & PULL replica types
Date Tue, 13 Feb 2018 07:53:28 GMT
2. In my experience using PULL replicas can have a significant positive 
effect on the server load. It depends of course on your analysis chain, 
but we do some fairly expensive analysis, and not having to do the same 
work X times does have a benefit. Unfortunately we need multiple shards 
so we can't currently isolate the query traffic from the indexing work.

I took a quick look at the shard selection code yesterday, and it seems 
it might be quite simple to add replica selection to the same place 
where preferLocalShards parameter is handled.

--Ere

Greg Roodt kirjoitti 12.2.2018 klo 22.06:
> Thanks Ere. I've taken a look at the discussion here:
> http://lucene.472066.n3.nabble.com/Limit-search-queries-only-to-pull-replicas-td4367323.html
> This is how I was imagining TLOG & PULL replicas would wor, so if this
> functionality does get developed, it would be useful to me.
> 
> I still have 2 questions at the moment:
> 1. I am running the single shard scenario. I'm thinking of using a
> dedicated HTTP load-balancer in front of the PULL replicas only with
> read-only queries directed directly at the load-balancer. In this
> situation, the healthy PULL replicas *should* handle the queries on the
> node itself without a proxy hop (assuming state=active). New PULL replicas
> added to the load-balancer will internally proxy queries to the other PULL
> or TLOG replicas while in state=recovering until the switch to
> state=active. Is my understanding correct?
> 
> 2. Is it all worth it? Is there any advantage to running a cluster of 3
> TLOGs + 10 PULL replicas vs running 13 TLOG replicas?
> 
> 
> 
> 
> On 12 February 2018 at 19:25, Ere Maijala <ere.maijala@helsinki.fi> wrote:
> 
>> Your question about directing queries to PULL replicas only has been
>> discussed on the list. Look for topic "Limit search queries only to pull
>> replicas". What I'd like to see is something similar to the
>> preferLocalShards parameter. It could be something like
>> "preferReplicaTypes=TLOG,PULL". Tomás mentioned previously that
>> SOLR-10880 could be used as a base for such funtionality, and I'm
>> considering taking a stab at implementing it.
>>
>> --Ere
>>
>>
>> Greg Roodt kirjoitti 12.2.2018 klo 6.55:
>>
>>> Thank you both for your very detailed answers.
>>>
>>> This is great to know. I knew that SolrJ had the cluster aware knowledge
>>> (via zookeeper), but I was wondering what something like curl would do.
>>> Great to know that internally the cluster will proxy queries to the
>>> appropriate place regardless.
>>>
>>> I am running the single shard scenario. I'm thinking of using a dedicated
>>> HTTP load-balancer in front of the PULL replicas only with read-only
>>> queries directed directly at the load-balancer. In this situation, the
>>> healthy PULL replicas *should* handle the queries on the node itself
>>> without a proxy hop (assuming state=active). New PULL replicas added to
>>> the
>>> load-balancer will internally proxy queries to the other PULL or TLOG
>>> replicas while in state=recovering until the switch to state=active.
>>>
>>> Is my understanding correct?
>>>
>>> Is this sensible to do, or is it not worth it due to the smart proxying
>>> that SolrCloud can do anyway?
>>>
>>> If the TLOG and PULL replicas are so similar, is there any real advantage
>>> to having a mixed cluster? I assume a bit less work is required across the
>>> cluster to propagate writes if you only have 3 TLOG nodes vs 10+ PULL
>>> nodes? Or would it be better to just have 13 TLOG nodes?
>>>
>>>
>>>
>>>
>>>
>>> On 12 February 2018 at 15:24, Tomas Fernandez Lobbe <tflobbe@apple.com>
>>> wrote:
>>>
>>> On the last question:
>>>> For Writes: Yes. Writes are going to be sent to the shard leader, and
>>>> since PULL replicas can’t  be leaders, it’s going to be a TLOG replica.
>>>> If
>>>> you are using CloudSolrClient, then this routing will be done directly
>>>> from
>>>> the client (since it will send the update to the leader), and if you are
>>>> using some other HTTP client, then yes, the PULL replica will forward the
>>>> update, the same way any non-leader node would.
>>>>
>>>> For reads: this won’t happen today, and any replica can respond to
>>>> queries. I do believe there is value in this kind of routing logic,
>>>> sometimes you simply don’t want the leader to handle any queries,
>>>> specially
>>>> when queries can be expensive. You could do this today if you want, by
>>>> putting some load balancer in front and just direct your queries to the
>>>> nodes you know are PULL, but keep in mind that this would only work in
>>>> the
>>>> single shard scenario, and only if you hit an active replica (otherwise,
>>>> as
>>>> you said, the query will be routed to any other node of the shard,
>>>> regardless of the type), if you have multiple shards then you need to use
>>>> the “shards” parameter and tell Solr exactly which nodes you want to
hit
>>>> for each shard (the “shards” approach can also be done in the single
>>>> shard
>>>> case, although you would be adding an extra hop I believe)
>>>>
>>>> Tomás
>>>> Sent from my iPhone
>>>>
>>>> On Feb 11, 2018, at 6:35 PM, Greg Roodt <groodt@gmail.com> wrote:
>>>>>
>>>>> Hi
>>>>>
>>>>> I have a question around how queries are routed and load-balanced in
a
>>>>> cluster of mixed TLOG and PULL replicas.
>>>>>
>>>>> I thought that I might have to put a load-balancer in front of the PULL
>>>>> replicas and direct queries at them manually as nodes are added and
>>>>>
>>>> removed
>>>>
>>>>> as PULL replicas. However, it seems that SolrCloud handles this
>>>>> automatically?
>>>>>
>>>>> If I add a new PULL replica node, it goes into state="recovering" while
>>>>>
>>>> it
>>>>
>>>>> pulls the core. As expected. What happens if queries are directed at
>>>>> this
>>>>> node while in this state? From what I am observing, the query gets
>>>>>
>>>> directed
>>>>
>>>>> to another node?
>>>>>
>>>>> If SolrCloud is handling the routing of requests to active nodes, will
>>>>> it
>>>>> automatically favour PULL replicas for read queries and TLOG replicas
>>>>> for
>>>>> writes?
>>>>>
>>>>> Thanks
>>>>> Greg
>>>>>
>>>>
>>>>
>>>
>> --
>> Ere Maijala
>> Kansalliskirjasto / The National Library of Finland
>>
> 

-- 
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Mime
View raw message