lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7589) Prevent outliers from raising the number of bits of everyone with numeric doc values
Date Mon, 12 Dec 2016 10:28:58 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15741566#comment-15741566
] 

Michael McCandless commented on LUCENE-7589:
--------------------------------------------

bq. However if you add a new field that stores the average number of miles per hour as a long
doc values field, then it highlights the quality issues of this dataset and disk usage for
this field goes from 40 to 15.7 bits per value (-60%) with the patch.

Ahhh, I see!  The taxis that go faster than the speed of light are not apparent now since
we don't store that field directly... makes sense.

> Prevent outliers from raising the number of bits of everyone with numeric doc values
> ------------------------------------------------------------------------------------
>
>                 Key: LUCENE-7589
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7589
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>         Attachments: LUCENE-7589.patch
>
>
> Today we encode entire segments with a single number of bits per value. It was done this
way because it was faster, but it also means a single outlier can significantly increase the
space requirements. I think we should have protection against that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message