lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7590) Add DocValues statistics helpers
Date Tue, 13 Dec 2016 18:30:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15745855#comment-15745855
] 

Shai Erera commented on LUCENE-7590:
------------------------------------

bq. Instead of using a NOOP_COLLECTOR, you could throw a CollectionTerminatedException

OK, good idea.

bq. By the way, in such cases I think we should still increase the missing count?

I am not sure? I mean, {{missing}} represents all the documents that matched the query and
did not have a value for that DV field. But when {{getLeafCollector}} is called, we don't
know yet that any documents will be matched by the query at all (I think?) and therefore updating
missing might be confusing? I.e., I'd expect that if anyone chained {{TotalHitsCollector}}
with {{DocValuesStatsCollector}}, then {{totalHits = stats.count() + stats.missing()}}? I
am open to discuss it, just not sure I always want to update missing with {{context.reader().numDocs()}}
...

bq. Can we avoid making DocValuesIterator public?

I did not find a way, since it's part of {{DocValuesStats.init()}} API and I think users should
be able to provide their own {{Stats}} impl, e.g. if they want to compute something on a {{BinaryDocValues}}
field?

Here too, I'd love to get more ideas though. I tried to avoid implementing N collectors, one
for each DV type, where they share a large portion of the code. But if you have strong opinions
about making {{DVI}} public, maybe that's what we should do ...

> Add DocValues statistics helpers
> --------------------------------
>
>                 Key: LUCENE-7590
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7590
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/misc
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch,
LUCENE-7590.patch
>
>
> I think it can be useful to have DocValues statistics helpers, that can allow users to
query for the min/max/avg etc. stats of a DV field. In this issue I'd like to cover numeric
DV, but there's no reason not to add it to other DV types too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message