lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "380382856@qq.com" <380382...@qq.com>
Subject Re: Re: Question about grouping in distribute mode
Date Thu, 06 Apr 2017 09:52:26 GMT
thank for your help
when i use compseId route ,i find the group.ngroup is a wrong number. I would like to know
what implementation mechanism has led to this happening。why  we must use implict route when
we want to use the group correctly



380382856@qq.com
 
From: Diego Ceccarelli (BLOOMBERG/ LONDON)
Date: 2017-04-06 17:16
To: 380382856
Subject: Re: Re: Question about grouping in distribute mode
Dear 380382856, 
I would be happy to help you if you can provide more informations, do you want to know why
grouping implements a specific route strategy? My point is that usually grouping involves
3 communications between the federator and the shards, but in case of ngroup=1 it would be
possible to obtain the same result with 2 communications. 

Can I please ask to post your question on the user solr mailing list [1]? in this way my answer
will be useful to all solr users and people more expert than me can also answer (or correct
me if I say something wrong :)) 

Have a good day! 
Diego

[1] http://lucene.apache.org/solr/community.html#mailing-lists-irc


From: 380382856@qq.com At: 04/06/17 08:38:20
To: DIEGO CECCARELLI (BLOOMBERG/ LONDON)
Subject: Re: Re: Question about grouping in distribute mode
hello can you help me?
There is a problem that has been bothering me.why solrcloud use group.ngroup shoud implements
implict route stratege?
380382856@qq.com
 
From: Diego Ceccarelli (BLOOMBERG/ LONDON)
Date: 2017-03-30 22:09
To: dev
Subject: Re: Question about grouping in distribute mode
Yes, I agree. And if there are not problems with the logic it would improve the performance
in both the cases.. 

From: dev@lucene.apache.org At: 03/30/17 14:59:31
To: dev@lucene.apache.org
Subject: Re: Question about grouping in distribute mode
This is also the case for non-distributed, isn’t it?  The lucene-level FirstPassGroupingCollector
doesn’t actually record the docid of the top doc for each group at the moment, but I don’t
think there’s any reason it couldn’t - it’s stored in the relevant FieldComparator.
 And it would be a nice shortcut in GroupingSearch more generally.

Alan Woodward
www.flax.co.uk


On 30 Mar 2017, at 14:26, Diego Ceccarelli <diego.ceccarelli@gmail.com> wrote:

Hello, I'm currently working on Solr grouping in order to support reranking [1].  
I've a working patch for non distribute search, and I'm now working on the 
distribute setting. 

Looking at the code of distribute grouping (top-k groups, top-n documents for each group)
search consists in: 

GROUPING_DISTRIBUTED_FIRST 
1. given the grouping query, each shard will return the top-k groups
2. federator will merge the top-k groups and will produce the top-k groups for the query

GROUPING_DISTRIBUTED_SECOND
1. given the top-k groups  each shard will return its top-n documents for each group.
2. federator will then compute top-n documents for each group merging all the shards responses.


GET_FIELDS
as usual 

My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, and return 
the top documents for each group with a new score given by the function used to rerank
(affecting maxScore for each group and then also the order of the groups).
Looking at the code then I realized that TopGroups asserts that order of the groups is not
changing, 
and I realized that indeed _ if the ranking function is the same, group order can't change
after the first stage _. 

My question is: if the user is interested only in the top document for each group (i.e., the
default: group.limit = 1) do we really need GROUPING_DISTRIBUTED_SECOND, or could we skip
it? 
is there any reason to perform grouping distributed second in this case? or we could just
return the top docid together with the topgroups in GROUPING_DISTRIBUTED_FIRST and then go
directly to GET_FIELDS? 

Cheers,
Diego

[1] https://issues.apache.org/jira/browse/SOLR-8542



Mime
View raw message