lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] [Assigned] (SOLR-11023) Need SortedNumerics/Points version of EnumField
Date Thu, 20 Jul 2017 18:47:00 GMT


Hoss Man reassigned SOLR-11023:

    Assignee: Hoss Man

I'm going to start working on this, but i'm still unclear if "points" is the best way to go
for the "very low cardinality + all values are small positive ints" situation.

[~mikemccand] & [~jpountz]: In terms of disk usage/search performance do you have any
sense of what makes more sense for enum type usecases?  using (int) dimensional Points vs
just using simple indexed terms Terms?  (I frankly don't understand the points "encoding"
and segment merging costs well enough to make any educated assumptions)

> Need SortedNumerics/Points version of EnumField
> -----------------------------------------------
>                 Key: SOLR-11023
>                 URL:
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Hoss Man
>            Assignee: Hoss Man
>              Labels: numeric-tries-to-points
> although it's not a subclass of TrieField, EnumField does use "LegacyIntField" to index
the int value associated with each of the enum values, in addition to using SortedSetDocValuesField
when {{docValues="true" multivalued="true"}}.
> I have no idea if Points would be better/worse then Terms for low cardinality usecases
like EnumField, but either way we should think about a new variant of EnumField that doesn't
depend on LegacyIntField/LegacyNumericUtils.intToPrefixCoded and uses SortedNumericDocValues.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message