lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilad Barkai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5476) Facet sampling
Date Sun, 09 Mar 2014 09:31:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925158#comment-13925158
] 

Gilad Barkai commented on LUCENE-5476:
--------------------------------------

{quote}
The limit should also take under account the total number of hits for the query, otherwise
the estimate and the multiplication with the sampling factor may yield a larger number than
the actual results.
{quote}

I understand this statement is confusing, I'll try to elaborate.
If the sample was *exactly* at the sampling ratio, this would not be a problem, but since
the sample - being random as it is - may be a bit larger, adjusting according to the original
sampling ratio (rather than the actual one) may yield larger counts than the actual results.

This could be solved by either limiting to the number of results, or adjusting the {{samplingRate}}
to be the exact, post-sampling, ratio.

> Facet sampling
> --------------
>
>                 Key: LUCENE-5476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5476
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Rob Audenaerde
>         Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, SamplingComparison_SamplingFacetsCollector.java,
SamplingFacetsCollector.java
>
>
> With LUCENE-5339 facet sampling disappeared. 
> When trying to display facet counts on large datasets (>10M documents) counting facets
is rather expensive, as all the hits are collected and processed. 
> Sampling greatly reduced this and thus provided a nice speedup. Could it be brought back?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message