lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7590) Add DocValues statistics helpers
Date Tue, 13 Dec 2016 18:30:59 GMT


Shai Erera commented on LUCENE-7590:

bq. Instead of using a NOOP_COLLECTOR, you could throw a CollectionTerminatedException

OK, good idea.

bq. By the way, in such cases I think we should still increase the missing count?

I am not sure? I mean, {{missing}} represents all the documents that matched the query and
did not have a value for that DV field. But when {{getLeafCollector}} is called, we don't
know yet that any documents will be matched by the query at all (I think?) and therefore updating
missing might be confusing? I.e., I'd expect that if anyone chained {{TotalHitsCollector}}
with {{DocValuesStatsCollector}}, then {{totalHits = stats.count() + stats.missing()}}? I
am open to discuss it, just not sure I always want to update missing with {{context.reader().numDocs()}}

bq. Can we avoid making DocValuesIterator public?

I did not find a way, since it's part of {{DocValuesStats.init()}} API and I think users should
be able to provide their own {{Stats}} impl, e.g. if they want to compute something on a {{BinaryDocValues}}

Here too, I'd love to get more ideas though. I tried to avoid implementing N collectors, one
for each DV type, where they share a large portion of the code. But if you have strong opinions
about making {{DVI}} public, maybe that's what we should do ...

> Add DocValues statistics helpers
> --------------------------------
>                 Key: LUCENE-7590
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/misc
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch, LUCENE-7590.patch,
> I think it can be useful to have DocValues statistics helpers, that can allow users to
query for the min/max/avg etc. stats of a DV field. In this issue I'd like to cover numeric
DV, but there's no reason not to add it to other DV types too.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message