lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Finding distinct unique IDs in documents returned by fq -- Urgent Help Req
Date Thu, 22 Jul 2010 22:16:43 GMT
: > being returned (consider the case where we are sorting in term order - once
: > we have collected counts for ${facet.limit} constraints, we can stop
: > iterating over terms -- but to compute the total umber of constraints (ie:
: > terms) we would have to keep going and test every one of them against
: > ${facet.mincount})
: >   
: I've been told this before, but it still doesn't really make sense to me.  How
: can you possibly find the top N constraints, without having at least examined
: all the contraints?  How do you know which are the top N if there are some you

that's exactly my point: in the scenerio where you've asked for 
facet.mincount=N&facet.limit=M&facet.sort=index you don't have to find hte 
"top" constraints, you just have to find the first M terms in index order 
that have a mincount of N.

: But I may be missing something. I've examined only one of the code
: paths/methods for faceting in source code, the one (if my reading was correct)
: that ends up used for high-cardinality multi-valued fields -- in that method,
: it looked like it should add no work at all to give you a facet unique value
: (result set value cardinality) count. (with facet.mincount of 1 anyway).  But
: I may have been mis-reading, or it may be that other methods are more
: troublesome.

in any case where you ar sorting by *counts* then yes, all of the 
constraints have to be checked, so you can count them as you go -- but 
that doesn't scale in distributed faceting, you can't just add the counts 
up from each shard because you don't know what the overlap is -- hence my 
comment about how to dedup them.

there are some simple usecases where it's feasible, but in general it's a 
very hard problem.


View raw message