lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Høydahl <jan....@cominvent.com>
Subject Re: Distributed search cross cluster
Date Thu, 01 Feb 2018 00:15:35 GMT
Hi,

I am an ex FAST employee and actually used Unity a lot myself, even hacking the code
writing custom mixers etc :)

That is all cool, if you want to write a generic federation layer. In our case we only ever
need to talk to Solr instances with exactly the same schema and doument types,
compatible scores etc. So that’s why I figure it is out of scope to write custom merge
code. It would also be less efficient since you’d get, say 10 hits from 10 clusters = 100
hits
while if you just let the original node talk to all the shards then you only fetch the top
docs
across all clusters.

I see many many open OLD JIRAs for federated features, which never got anywhere,
so I take that also as a hint that this is either not needed or very complex :)

Takling about FAST ESP, the "fsearch" process responsible for merging results from 
underlying indices was actually used at multiple levels, so to federate two FAST clusters
all you had to do was put a top level fsearch process above all of them and point it to
the right host:port list, then a QRServer on top of that fsearch again. Those were the days.

If there was some class that would delegate an incoming search request to sub shards
in a generic way, without writing all the merge and two-phase stuff over again, then
that would be ideal.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 31. jan. 2018 kl. 10:41 skrev Bernd Fehling <bernd.fehling@uni-bielefeld.de>:
> 
> Many years ago, in a different universe, when Federated Search was a buzzword we
> used Unity from FAST FDS (which is now MS ESP). It worked pretty well across
> many systems like FAST FDS, Google, Gigablast, ...
> Very flexible with different mixers, parsers, query transformers.
> Was written in Python and used pylib.medusa.
> Search for "unity federated search", there is a book at Google about this, just
> to get an idea.
> 
> Regards, Bernd
> 
> 
> Am 30.01.2018 um 17:09 schrieb Jan Høydahl:
>> Hi,
>> 
>> A customer has 10 separate SolrCloud clusters, with same schema across all, but different
content.
>> Now they want users in each location to be able to federate a search across all locations.
>> Each location is 100% independent, with separate ZK etc. Bandwidth and latency between
the
>> clusters is not an issue, they are actually in the same physical datacenter.
>> 
>> Now my first thought was using a custom &shards parameter, and let the receiving
node fan
>> out to all shards of all clusters. We’d need to contact the ZK for each environment
and find
>> all shards and replicas participating in the collection and then construct the shards=A1|A2,B1|B2…
>> sting which would be quite big, but if we get it right, it should “just work".
>> 
>> Now, my question is whether there are other smarter ways that would leave it up to
existing Solr
>> logic to select shards and load balance, that would also take into account any shard.keys/_route_
>> info etc. I thought of these
>>  * &collection=collA,collB  — but it only supports collections local to one
cloud
>>  * Create a collection ALIAS to point to all 10 — but same here, only local to
one cluster
>>  * Streaming expression top(merge(search(q=,zkHost=blabla))) — but we want it with
pure search API
>>  * Write a custom ShardHandler plugin that knows about all clusters — but this
is complex stuff :)
>>  * Write a custom SearchComponent plugin that knows about all clusters and adds the
&shards= param
>> 
>> Another approach would be for the originating cluster to fan out just ONE request
to each of the other
>> clusters and then write some SearchComponent to merge those responses. That would
let us query
>> the other clusters using one LB IP address instead of requiring full visibility to
all solr nodes
>> of all clusters, but if we don’t need that isolation, that extra merge code seems
fairly complex.
>> 
>> So far I opt for the custom SearchComponent and &shards= param approach. Any
useful input from
>> someone who tried a similar approach would be priceless!
>> 
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>> 


Mime
View raw message