lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount
Date Wed, 16 Mar 2011 15:18:29 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007508#comment-13007508
] 

Yonik Seeley commented on SOLR-2403:
------------------------------------

bq. Dividing by shard count is fairly risky. 

Actually, it seems like it should help? (when mincount is relatively high at least).

Let's take your example of facet.mincount=10, facet.limit=2, facet.sort=index
{code}
Shard 1: A(1) B(1) C(1) D(1) E(1) F(9) G(1) H(1)
Shard 2: A(1) B(1) C(1) D(1) E(1) F(1) G(1) H(10)
{code}

mincount / nShards = 5, so the shard requests sent will be along the lines of
facet.mincount=5, facet.limit=5, facet.sort=index  (some over-requesting)
and we will get back
F(9), H(10)

The second phase (facet refinement... to true up counts) will retrieve counts from each shard
for constraints in the list that it didn't return the first time.
So shard1 will be asked about H, and shard2 will be asked about F.

The final response will be F(10),H(11)

bq. Over-requesting helps, but only linear to the fraction of the full result-set from each
shard that is requested.

Yes, I think you're correct that over-requesting is less useful for sort=index than sort=count.
Luckily, we can fix the mincount=1 problem and get exact answers for that case, which is the
most important case.  I think mincount > 1 is relatively rare.




> Problem with facet.sort=lex, shards, and facet.mincount
> -------------------------------------------------------
>
>                 Key: SOLR-2403
>                 URL: https://issues.apache.org/jira/browse/SOLR-2403
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 4.0
>         Environment: RHEL5, Ubuntu 10.04
>            Reporter: Peter Cline
>
> I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or 1.4.1.
 I can if necessary and update.
> Solr is not returning the proper number of facet values when sorting alphabetically,
using distributed search, and using a facet.mincount that excludes some of the values in the
first facet.limit values.
> Easiest explained by example.  Sorting alphabetically, the first 20 values for my "subject_facet"
field have few documents.  19 facet values have only 1 document associated, and 1 has 2 documents.
 There are plenty after that have more than 2.
> {code}
> http://localhost:8082/solr/select?q=*:*&facet=true&facet.field=subject_facet&facet.limit=20&facet.sort=lex&facet.mincount=2
> {code}
> comes back with the expected 20 facet values with >= 2 documents associated.
> If I add a shards parameter that points back to itself, the result is different.
> {code}
> http://localhost:8082/solr/select?q=*:*&facet=true&facet.field=subject_facet&facet.limit=20&facet.sort=lex&facet.mincount=2&shards=localhost:8082/solr
> {code}
> comes back with only 1 facet value: the single value in the first 20 that had more than
1 document.  
> It appears to me that mincount is ignored when doing the original query to the shards,
then applied afterwards.
> Let me know if you need any more info.  
> Thanks,
> Peter

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message