lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gilad Barkai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5476) Facet sampling
Date Fri, 07 Mar 2014 21:06:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924348#comment-13924348
] 

Gilad Barkai commented on LUCENE-5476:
--------------------------------------

{quote}
Btw. Is there an easy way to retrieve the total facet counts for a ordinal? When correcting
facet counts it would a quick win to limit the number of estimated documents to the actual
number of documents in the index that match that facet. (And maybe use the distribution as
well, to make better estimates)
{quote}

That's a great idea!

The {{docFreq}} of the category drill-down term is an upper bound - and could be used as a
limit.
It's cheap, but might not be the exact number as it also take under account deleted documents.

The limit should also take under account the total number of hits for the query, otherwise
the estimate and the multiplication with the sampling factor may yield a larger number than
the actual results.

> Facet sampling
> --------------
>
>                 Key: LUCENE-5476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5476
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Rob Audenaerde
>         Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, SamplingComparison_SamplingFacetsCollector.java,
SamplingFacetsCollector.java
>
>
> With LUCENE-5339 facet sampling disappeared. 
> When trying to display facet counts on large datasets (>10M documents) counting facets
is rather expensive, as all the hits are collected and processed. 
> Sampling greatly reduced this and thus provided a nice speedup. Could it be brought back?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message