lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <>
Subject Re: federated / meta search
Date Sat, 19 Jun 2010 00:16:46 GMT
Yes, you can do this. You need to have a common system for creating
unique ids for the documents.

Also, there's an odd problem around relevance. Relevance scoring is
based on all of the terms in a field in the whole index, and there is
a "statistical fingerprint" of this for an index. With two indexes
from two sources, the terms in the documents will not have the same
"fingerprint". Relevance scores from one shard will not match the
meaning of a document's score in the other shard.

There is a project to make this work in Solr, but it is not nearly finished.

Lance Norskog

On Fri, Jun 18, 2010 at 4:28 AM, Sascha Szott <> wrote:
> Hi Joe & Markus,
> sounds good! Maybe I should better add a note on the Wiki page on federated
> search [1].
> Thanks,
> Sascha
> [1]
> Joe Calderon wrote:
>> yes, you can use distributed search across shards with different
>> schemas as long as the query only references overlapping fields, i
>> usually test adding new fields or tokenizers on one shard and deploy
>> only after i verified its working properly
>> On Thu, Jun 17, 2010 at 1:10 PM, Markus Jelsma<>
>>  wrote:
>>> Hi,
>>> Check out Solr sharding [1] capabilities. I never tested it with
>>> different schema's but if each node is queried with fields that it supports,
>>> it should return useful results.
>>> [1]:
>>> Cheers.
>>> -----Original message-----
>>> From: Sascha Szott<>
>>> Sent: Thu 17-06-2010 19:44
>>> To:;
>>> Subject: federated / meta search
>>> Hi folks,
>>> if I'm seeing it right Solr currently does not provide any support for
>>> federated / meta searching. Therefore, I'd like to know if anyone has
>>> already put efforts into this direction? Moreover, is federated / meta
>>> search considered a scenario Solr should be able to deal with at all or
>>> is it (far) beyond the scope of Solr?
>>> To be more precise, I'll give you a short explanation of my
>>> requirements. Assume, there are a couple of Solr instances running at
>>> different places. The documents stored within those instances are all
>>> from the same domain (bibliographic records), but it can not be ensured
>>> that the schema definitions conform to 100%. But lets say, there are at
>>> least some index fields that are present in all instances (fields with
>>> the same name and type definition). Now, I'd like to perform a search on
>>> all instances at the same time (with the restriction that the query
>>> contains only those fields that overlap among the different schemas) and
>>> combine the results in a reasonable way by utilizing the score
>>> information associated with each hit. Please note, that due to legal
>>> issues it is not feasible to build a single index that integrates the
>>> documents of all Solr instances under consideration.
>>> Thanks in advance,
>>> Sascha

Lance Norskog

View raw message