lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-5936) Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0
Date Sat, 29 Mar 2014 20:50:15 GMT

    [ https://issues.apache.org/jira/browse/SOLR-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13954438#comment-13954438
] 

Jack Krupansky commented on SOLR-5936:
--------------------------------------

As part of this cleanup, could somebody volunteer to create a plain-English summary of exactly
what a trie field really is, what good it is, and why we can't live without them? I've read
the code and, okay, there is a sequence of bit shifts and generation of extra terms, but in
plain English, what's the point?

I'm not asking for a recitation of the actual algorithm(s), but some intuitively accessible
summary. I would note that the typical examples are for strings with prefixes rather than
binary numbers.

See:
http://en.wikipedia.org/wiki/Trie

And, is trie really the best solution for number types? Does it actually have real value for
float and double values?

And I would really like to see some plain, easily readable explanation of precision step.
Again, especially for real numbers.

And how should precision step be used for dates?

I mean, other than assuring sort order, why bother with trie? Or more specifically, why does
a Solr (or Lucene) user need to know that trie is used for the implementation?

Specifically, for example, does it matter if a field has an evenly distributed range of numeric
values with little repetition vs. numeric codes where there is a relatively small number of
distinct values (e.g., 1-10, or scores of 0-100 or dates in years between 1970 and 2014) and
relatively high cardinality? I mean, does trie do a uniformly great job for both of these
extreme use cases, including for faceting?

And if trie really is the best approach for numeric fields, why not just do all of this under
the hood instead of polluting the field type names with "trie"? IOW, rename TrieIntField to
IntField, etc.

To me, trie just seems like unnecessary noise to average users.


> Deprecate non-Trie-based numeric (and date) field types in 4.x and remove them from 5.0
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-5936
>                 URL: https://issues.apache.org/jira/browse/SOLR-5936
>             Project: Solr
>          Issue Type: Task
>          Components: Schema and Analysis
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>            Priority: Minor
>             Fix For: 4.8, 5.0
>
>         Attachments: SOLR-5936.branch_4x.patch, SOLR-5936.branch_4x.patch
>
>
> We've been discouraging people from using non-Trie numeric&date field types for years,
it's time we made it official.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message