lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: Re: Question about grouping in distribute mode
Date Thu, 06 Apr 2017 09:52:26 GMT
thank for your help
when i use compseId route ,i find the group.ngroup is a wrong number. I would like to know
what implementation mechanism has led to this happening。why  we must use implict route when
we want to use the group correctly
From: Diego Ceccarelli (BLOOMBERG/ LONDON)
Date: 2017-04-06 17:16
To: 380382856
Subject: Re: Re: Question about grouping in distribute mode
Dear 380382856, 
I would be happy to help you if you can provide more informations, do you want to know why
grouping implements a specific route strategy? My point is that usually grouping involves
3 communications between the federator and the shards, but in case of ngroup=1 it would be
possible to obtain the same result with 2 communications. 

Can I please ask to post your question on the user solr mailing list [1]? in this way my answer
will be useful to all solr users and people more expert than me can also answer (or correct
me if I say something wrong :)) 

Have a good day! 


From: At: 04/06/17 08:38:20
Subject: Re: Re: Question about grouping in distribute mode
hello can you help me?
There is a problem that has been bothering me.why solrcloud use group.ngroup shoud implements
implict route stratege?
From: Diego Ceccarelli (BLOOMBERG/ LONDON)
Date: 2017-03-30 22:09
To: dev
Subject: Re: Question about grouping in distribute mode
Yes, I agree. And if there are not problems with the logic it would improve the performance
in both the cases.. 

From: At: 03/30/17 14:59:31
Subject: Re: Question about grouping in distribute mode
This is also the case for non-distributed, isn’t it?  The lucene-level FirstPassGroupingCollector
doesn’t actually record the docid of the top doc for each group at the moment, but I don’t
think there’s any reason it couldn’t - it’s stored in the relevant FieldComparator.
 And it would be a nice shortcut in GroupingSearch more generally.

Alan Woodward

On 30 Mar 2017, at 14:26, Diego Ceccarelli <> wrote:

Hello, I'm currently working on Solr grouping in order to support reranking [1].  
I've a working patch for non distribute search, and I'm now working on the 
distribute setting. 

Looking at the code of distribute grouping (top-k groups, top-n documents for each group)
search consists in: 

1. given the grouping query, each shard will return the top-k groups
2. federator will merge the top-k groups and will produce the top-k groups for the query

1. given the top-k groups  each shard will return its top-n documents for each group.
2. federator will then compute top-n documents for each group merging all the shards responses.

as usual 

My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and return 
the top documents for each group with a new score given by the function used to rerank
(affecting maxScore for each group and then also the order of the groups).
Looking at the code then I realized that TopGroups asserts that order of the groups is not
and I realized that indeed _ if the ranking function is the same, group order can't change
after the first stage _. 

My question is: if the user is interested only in the top document for each group (i.e., the
default: group.limit = 1) do we really need GROUPING_DISTRIBUTED_SECOND, or could we skip
is there any reason to perform grouping distributed second in this case? or we could just
return the top docid together with the topgroups in GROUPING_DISTRIBUTED_FIRST and then go
directly to GET_FIELDS? 



View raw message