lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <>
Subject Re: Distributed search cross cluster
Date Thu, 01 Feb 2018 00:24:52 GMT

> for each cluster and just merged the docs when it got them back

This would be the logical way. I'm afraid that "just merged the docs" is the crux here, that
make this an expensive task. You'd have to merge docs, facets, highlights etc, handle the
different search phases (ID fetch, doc fetch, potentially global idf fetch?) etc.
It may be that the code necessary to do the merge already exists in the project, haven't looked...


Yes it should "just" work. Until someone upgrades the schema in one cloud and not the others
of course :) and we still need to handle failure cases such as high latency or one cluster

Besides, we'll have SSL certs with client auth and probably some sort of auth&auz in place
all clouds, and we'd of course need to make sure that the user exists in all clusters and
cross cluster traffic is allowed in everywhere. PKI auth is not really intended for accepting
requests from a foreign node that is not in its ZK etc.

Jan Høydahl, search solution architect
Cominvent AS -

> 31. jan. 2018 kl. 10:06 skrev Charlie Hull <>:
> On 30/01/2018 16:09, Jan Høydahl wrote:
>> Hi,
>> A customer has 10 separate SolrCloud clusters, with same schema across all, but different
>> Now they want users in each location to be able to federate a search across all locations.
>> Each location is 100% independent, with separate ZK etc. Bandwidth and latency between
>> clusters is not an issue, they are actually in the same physical datacenter.
>> Now my first thought was using a custom &shards parameter, and let the receiving
node fan
>> out to all shards of all clusters. We’d need to contact the ZK for each environment
and find
>> all shards and replicas participating in the collection and then construct the shards=A1|A2,B1|B2…
>> sting which would be quite big, but if we get it right, it should “just work".
>> Now, my question is whether there are other smarter ways that would leave it up to
existing Solr
>> logic to select shards and load balance, that would also take into account any shard.keys/_route_
>> info etc. I thought of these
>>   * &collection=collA,collB  — but it only supports collections local to one
>>   * Create a collection ALIAS to point to all 10 — but same here, only local to
one cluster
>>   * Streaming expression top(merge(search(q=,zkHost=blabla))) — but we want it
with pure search API
>>   * Write a custom ShardHandler plugin that knows about all clusters — but this
is complex stuff :)
>>   * Write a custom SearchComponent plugin that knows about all clusters and adds
the &shards= param
>> Another approach would be for the originating cluster to fan out just ONE request
to each of the other
>> clusters and then write some SearchComponent to merge those responses. That would
let us query
>> the other clusters using one LB IP address instead of requiring full visibility to
all solr nodes
>> of all clusters, but if we don’t need that isolation, that extra merge code seems
fairly complex.
>> So far I opt for the custom SearchComponent and &shards= param approach. Any
useful input from
>> someone who tried a similar approach would be priceless!
> Hi Jan,
> We actually looked at this for the BioSolr project - a SolrCloud of SolrClouds. Unfortunately
the funding didn't appear for the project so we didn't take it any further than some rough
ideas - as you say, if you get it right it should 'just work'. We had some extra complications
in terms of shared partial schemas...
> Cheers
> Charlie
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - <>
> -- 
> Charlie Hull
> Flax - Open Source Enterprise Search
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: <>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message