lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Merging documents from a distributed search
Date Thu, 03 Sep 2015 08:26:58 GMT
Hello - We're doing something similar ended up overriding QueryComponent (https://issues.apache.org/jira/browse/SOLR-7968)
which needs protected members instead of private members first. We could do a RankQuery and
use its cool MergeStrategy, but we would also ened RankQuery to provide an entry point for
QueryComponent.createMainQuery(). That would be ideal because we can then use the Collector
there for local deduplication, and a combination of createMainQuery and mergeIds to do the
distributed deduplication.

Markus
 
-----Original message-----
> From:Joel Bernstein <joelsolr@gmail.com>
> Sent: Wednesday 2nd September 2015 23:46
> To: solr-user@lucene.apache.org
> Subject: Re: Merging documents from a distributed search
> 
> The merge strategy probably won't work for the type of distributed collapse
> you're describing.
> 
> You may want to begin exploring the Streaming API which supports real-time
> map/reduce operations,
> 
> http://joelsolr.blogspot.com/2015/03/parallel-computing-with-solrcloud.html
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> On Wed, Sep 2, 2015 at 5:12 PM, tedsolr <tsmith@sciquest.com> wrote:
> 
> > I've read from  http://heliosearch.org/solrs-mergestrategy/
> > <http://heliosearch.org/solrs-mergestrategy/>   that the AnalyticsQuery
> > component only works for a single instance of Solr. I'm planning to
> > "migrate" to the SolrCloud soon and I have a custom AnalyticsQuery module
> > that collapses what I consider to be duplicate documents, keeping stats
> > like
> > a "count" of the dupes. For my purposes "dupes" are determined at run time
> > and vary by the search request. Once a collection has multiple shards I
> > will
> > not be able to prevent "dupes" from appearing across those shards. A custom
> > merge strategy should allow me to merge my stats, but I don't see how I can
> > drop duplicate docs at that point.
> >
> > If shard1 returns docs A & B and shard2 returns docs B & C (letters
> > denoting
> > what I consider to be unique docs), can my implementation of a merge
> > strategy return only docs A, B, & C, rather than A, B, B, & C?
> >
> > thanks!
> > solr 5.2.1
> >
> >
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Merging-documents-from-a-distributed-search-tp4226802.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> 

Mime
View raw message