lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "380382856@qq.com" <380382...@qq.com>
Subject Re: RE: Question about grouping in distribute mode
Date Fri, 07 Apr 2017 01:49:47 GMT
thank you
i think it is only use shard1.groupNumber add shard2.groupNumber。but groupA may also in
shar1 and shard2. so the group.ngroup always bigger than the realy number?


380382856@qq.com
 
From: Ian Caldwell
Date: 2017-04-07 09:32
To: 'dev@lucene.apache.org'
Subject: RE: Re: Question about grouping in distribute mode
I think the this happens because the First Pass gets the top nGroups and holds the shards
that they came from, 
then for the second pass it is only searching the shards that contributed to the list instead
of searching all shards.
 
So if searching for the top 10 groups a shard may have data from that group but it is ranked
11th (outside the top 10) then this shard is left off the list for the second pass.
 
Searching(for 3 groups) could return 
From GROUPING_DISTRIBUTED_FIRST
shard1: groupA, groupB & groupC      (groupD ranked 4th so not returned in the list)
shard2: groupA, groupC & groupD
 
After merging, the top groups would be groupA, groupC & groupD
 
From GROUPING_DISTRIBUTED_SECOND
Shard1: 
groupA: doc1, doc3 & doc5
groupC: doc 11, doc13 & doc15
groupD: doc111, doc113 & doc115
Shard2: 
groupA: doc2, doc4 & doc6
groupC: doc12, doc14 & doc16
groupD: doc112, doc114 & doc116.
 
So you need to do the second pass against all shards for the top docs so that you don’t
miss the docs from groupD in shard1.
 
 
 
Ian
NLA
 
 
-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Friday, 7 April 2017 1:16 AM
To: dev@lucene.apache.org
Subject: Re: Re: Question about grouping in distribute mode
 
from the reference guide:
 
group.ngroups and group.facet require that all documents in each group must be co-located
on the same shard in order for accurate counts to be returned.
 
Can't give you a technical reason, but there's no expectation it is supported with composite
ID routing.
 
Best,
Erick
 
On Thu, Apr 6, 2017 at 2:52 AM, 380382856@qq.com <380382856@qq.com> wrote:
> thank for your help
> when i use compseId route ,i find the group.ngroup is a wrong number. 
> I would like to know what implementation mechanism has led to this
> happening。why  we must use implict route when we want to use the group 
> correctly
>
> ________________________________
> 380382856@qq.com
>
>
> From: Diego Ceccarelli (BLOOMBERG/ LONDON)
> Date: 2017-04-06 17:16
> To: 380382856
> Subject: Re: Re: Question about grouping in distribute mode Dear 
> 380382856, I would be happy to help you if you can provide more 
> informations, do you want to know why grouping implements a specific 
> route strategy? My point is that usually grouping involves 3 
> communications between the federator and the shards, but in case of 
> ngroup=1 it would be possible to obtain the same result with 2 
> communications.
>
> Can I please ask to post your question on the user solr mailing list 
> [1]? in this way my answer will be useful to all solr users and people 
> more expert than me can also answer (or correct me if I say something 
> wrong :))
>
> Have a good day!
> Diego
>
> [1] http://lucene.apache.org/solr/community.html#mailing-lists-irc
>
>
> From: 380382856@qq.com At: 04/06/17 08:38:20
> To: DIEGO CECCARELLI (BLOOMBERG/ LONDON)
> Subject: Re: Re: Question about grouping in distribute mode
>
> hello can you help me?
> There is a problem that has been bothering me.why solrcloud use 
> group.ngroup shoud implements implict route stratege?
> 380382856@qq.com
>
>
> From: Diego Ceccarelli (BLOOMBERG/ LONDON)
> Date: 2017-03-30 22:09
> To: dev
> Subject: Re: Question about grouping in distribute mode Yes, I agree. 
> And if there are not problems with the logic it would improve the 
> performance in both the cases..
>
> From: dev@lucene.apache.org At: 03/30/17 14:59:31
> To: dev@lucene.apache.org
> Subject: Re: Question about grouping in distribute mode
>
> This is also the case for non-distributed, isn’t it?  The lucene-level 
> FirstPassGroupingCollector doesn’t actually record the docid of the 
> top doc for each group at the moment, but I don’t think there’s any 
> reason it couldn’t - it’s stored in the relevant FieldComparator.  And 
> it would be a nice shortcut in GroupingSearch more generally.
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 30 Mar 2017, at 14:26, Diego Ceccarelli 
> <diego.ceccarelli@gmail.com>
> wrote:
>
> Hello, I'm currently working on Solr grouping in order to support 
> reranking [1].
> I've a working patch for non distribute search, and I'm now working on 
> the distribute setting.
>
> Looking at the code of distribute grouping (top-k groups, top-n 
> documents for each group) search consists in:
>
> GROUPING_DISTRIBUTED_FIRST
> 1. given the grouping query, each shard will return the top-k groups 
> 2. federator will merge the top-k groups and will produce the top-k 
> groups for the query
>
> GROUPING_DISTRIBUTED_SECOND
> 1. given the top-k groups  each shard will return its top-n documents 
> for each group.
> 2. federator will then compute top-n documents for each group merging 
> all the shards responses.
>
> GET_FIELDS
> as usual
>
> My plan was to change the collector in GROUPING_DISTRIBUTED_SECOND, 
> and return the top documents for each group with a new score given by 
> the function used to rerank (affecting maxScore for each group and 
> then also the order of the groups).
> Looking at the code then I realized that TopGroups asserts that order 
> of the groups is not changing, and I realized that indeed _ if the 
> ranking function is the same, group order can't change after the first 
> stage _.
>
> My question is: if the user is interested only in the top document for 
> each group (i.e., the default: group.limit = 1) do we really need 
> GROUPING_DISTRIBUTED_SECOND, or could we skip it?
> is there any reason to perform grouping distributed second in this 
> case? or we could just return the top docid together with the 
> topgroups in GROUPING_DISTRIBUTED_FIRST and then go directly to GET_FIELDS?
>
> Cheers,
> Diego
>
> [1] https://issues.apache.org/jira/browse/SOLR-8542
>
>
>
 
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail:
dev-help@lucene.apache.org
 
 
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org
 
Mime
View raw message