hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15204) Add Configuration API for parsing storage sizes
Date Fri, 02 Feb 2018 20:07:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16350889#comment-16350889

Chris Douglas commented on HADOOP-15204:

bq. I find getTimeDuration API extremely intuitive, hence imitating that for this API
Soright; I only mention it as relevant context.

bq. Rounding causes a significant loss when we convert from x bytes to y exabytes. Hence I
voted for the least element of surprise and decided to return double
Is it sufficient for the caller to provide the precision? If the caller wants petabytes with
some decimal places, then they can request terabytes. If they want to ensure the conversion
is within some epsilon, then they can request the value with high precision and measure the
loss. Even rounding decisions can be left to the caller. Instead of passing this context into
{{Configuration}} (or setting defaults), its role can be limited to converting and scaling
the stringly-typed value.

bq. That would let me return something like -1 to mean "no value set", which I can't do with
the current API.
{{Configuration}} supports that with a raw {{get(key)}}. It's only where we have the default
in hand that it provides typed getters.

bq. This is the curse of writing a unit as a library; we need to be cognizant of that single
use case which will break us. Hence I have used bigDecimal to be safe and correct and return
doubles. It yields values that people expect.
Sure, I only mention it because it differs from {{getTimeDuration}}. With {{TimeUnit}}, a
caller could, with low false positives, check if the result was equal to max to detect overflow.
Doing that here would have a higher false positive rate, so the {{BigDecimal}} approach with
explicit precision is superior.

* In this overload:
+  public double getStorageSize(String name, String defaultValue,
+      StorageUnit targetUnit) {
Does this come up often? The {{getTimeDuration}} assumes that the default will be in the same
unit as the conversion. So in your example, one would write {{getTimeDuration("key", 5000,
MEGABYTES)}}. It's less elegant, but it type checks.
* I'd lean toward {{MB}} instead of {{MEGABYTES}}, and similar. Even as a static import, those
are unlikely to collide and they're equally readable.

> Add Configuration API for parsing storage sizes
> -----------------------------------------------
>                 Key: HADOOP-15204
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15204
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 3.1.0
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>            Priority: Minor
>             Fix For: 3.1.0
>         Attachments: HADOOP-15204.001.patch, HADOOP-15204.002.patch
> Hadoop has a lot of configurations that specify memory and disk size. This JIRA proposes
to add an API like {{Configuration.getStorageSize}} which will allow users
>  to specify units like KB, MB, GB etc. This is JIRA is inspired by HADOOP-8608 and Ozone.
Adding {{getTimeDuration}} support was a great improvement for ozone code base, this JIRA
hopes to do the same thing for configs that deal with disk and memory usage.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message