lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-7927) Add facets impl to count unique numeric values
Date Mon, 21 Aug 2017 15:22:00 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-7927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-7927:
---------------------------------------
    Attachment: LUCENE-7927.patch

Another iteration, also adding an option to count all facets from a {{LongValuesSource}}.

I made a simple artificial benchmark (https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/NumericValueFacetBenchmark.java),
indexing 50M docs with a numeric DV field with values 0 - 9, to test whether special casing
small values (0-1023) is worthwhile:

Counting long values for all docs takes 99.0 msec (best of 100 iters), and 153.4 msec if I
turn off the opto, so ~35% faster.

The overall gains are less if I run an {{IntPoint.newRangeQuery}} matchin first 50% of the
index and compute facets on that: 255.3 msec and 279.4 if I turn off the optimization, so
~9% faster.  But net/net I think we should keep the opto... I think it's a common use case
to count smallish ordinals.



> Add facets impl to count unique numeric values
> ----------------------------------------------
>
>                 Key: LUCENE-7927
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7927
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 7.1
>
>         Attachments: LUCENE-7927.patch, LUCENE-7927.patch, LUCENE-7927.patch
>
>
> The facets module has multiple facet methods for counting flat and hierarchical fields,
and also a method for counting numeric ranges.  I'd like to also add a method that counts
unique numeric (long) values, designed to be used for fields that have only a few, typically
low valued, numbers across the index e.g. a "review" rating from 1 to 5.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message