lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Klaas" <mike.kl...@gmail.com>
Subject Re: Does solr support Multi index and return by score and datetime
Date Thu, 05 Apr 2007 02:07:01 GMT
On 4/4/07, James liu <liuping.james@gmail.com> wrote:

> > > I think it is part of full-text search.
>
> I think query slavers and combin result by score should be the part of solr.
>
> I find it http://dev.lucene-ws.net/wiki/MultiIndexOperations
> but i wanna use solr and i like it.
>
> Now i wanna find a good method to solve it by using solr and less
> coding.(More code will cost more time to write and test.)

I agree that it would be an excellent addition to Solr, but it is a
major undertaking, and so I wouldn't wait around for it if it is
important to you.  Solr devs have code to write and test too :).

> > >  If you document
> > > > distribution is uniform random, then the norms converge to
> > > > approximately equal values anyway.
> > >
> > > I don't know it.
>
> I don't know why u say "document distribution". Does it mean if i write code
> independently, i will consider it?

One of the complexities of queries multiple remote Solr/lucene
instances is that the scores are not directly comparable as the term
idf scores will be different.  However, in practical situations, this
can be glossed over.

This is the basic algorithm for single-pass querying multiple solr
slaves.  Say you want results N to N + M (e.g 10 to 20).

 1. query each solr instance independently for N+M documents for the
given query.  This should be done asynchronously (or you could spawn a
thread per server).
 2. wait for all responses (or for a certain timeout)
 3. put all returned documents into an array, and reverse sort by score
 4. select documents [N, N+M) from this array.

This is a relatively simple task.  It gets more complicated once
multiple passes, idf compensation, deduplication, etc. are added.

-Mike

Mime
View raw message