lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Rochkind <rochk...@jhu.edu>
Subject Re: Finding distinct unique IDs in documents returned by fq -- Urgent Help Req
Date Thu, 22 Jul 2010 21:15:08 GMT
Chris Hostetter wrote:
> computing the number:  in some algorithms it's relatively cheap (on a 
> single server) but in others it's more expensive then computing the facet 
> counts being returned (consider the case where we are sorting in term 
> order - once we have collected counts for ${facet.limit} constraints, we 
> can stop iterating over terms -- but to compute the total umber of 
> constraints (ie: terms) we would have to keep going and test every one of 
> them against ${facet.mincount})
>   
I've been told this before, but it still doesn't really make sense to 
me.  How can you possibly find the top N constraints, without having at 
least examined all the contraints?  How do you know which are the top N 
if there are some you haven't looked at? And if you've looked at them 
all, it's no problem to increment at a counter as you look at each one.  
Although I guess the facet.minCount test does possibly put a crimp in 
things, I don't ever use that param myself to be something other than 1, 
so hadn't considered it.

But I may be missing something. I've examined only one of the code 
paths/methods for faceting in source code, the one (if my reading was 
correct) that ends up used for high-cardinality multi-valued fields -- 
in that method, it looked like it should add no work at all to give you 
a facet unique value (result set value cardinality) count. (with 
facet.mincount of 1 anyway).  But I may have been mis-reading, or it may 
be that other methods are more troublesome.

At any rate, if I need it bad enough, I'll try to write my own facet 
component that does it (perhaps a subclass of the existing SimpleFacet), 
and see what happens.  It does seem to be something a variety of 
people's use cases could use, I see it mentioned periodically in the 
list serv archives.

Jonathan



Mime
View raw message