hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anu Engineer (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-15204) Add Configuration API for parsing storage sizes
Date Thu, 01 Feb 2018 19:17:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16349121#comment-16349121
] 

Anu Engineer edited comment on HADOOP-15204 at 2/1/18 7:16 PM:
---------------------------------------------------------------

[~stevel@apache.org] I did look at getLongBytes. The issue is that it does not provide the
same code improvement as {{getTimeDuration}} and {{setTimeDuration}}.

Let me support my assertion with some examples that you can look at.

Let us suppose that I have a case to read the standard configured size of containers. This
is a configurable parameter in Ozone.

Here is how the code would look like with get/setStorageSize.
{code:java}
setStorageSize(OZONE_CONTAINER_SIZE, 1, GIGABYTES);
// let us suppose we want to read back this config value as MBs.
long valueInMB = getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);{code}
now let us see how we write this code using getLongBytes..
{code:java}
// there is no symetric function called setLongBytes.. So I have to fall back to setLong
// 
// Here is my first attempt.
setLong(OZONE_CONTAINER_SIZE, 1048576); 
// This looks bad, since I am not sure if the OZONE_CONTAINER_SIZE is in MB/GB or what, 
// So the many keys get tagged as OZONE_CONTAINER_SIZE_GB

// Now second attempt.
setLong(OZONE_CONTAINER_SIZE_GB, 1048576); 
// But this is bad too, now I can not set container size in MB, since setLong takes a
// whole number and not a fraction.
// So now third attempt -- convert all fields to bytes
setLong(OZONE_CONTAINER_SIZE_BYTES, 5368709120); // The default is 5 GG.
{code}
Before you think this is a made up example, this is part of changes that we tried which triggered
this change.

Now let us go to back get examples –
{code:java}
// getLongBytes forces us to write code in a certain way which does not match with
// rest of code like getTimeDuration. For example, if I have a case where I want to read //
the read the value in MB, and the ozone-default.xml is configured to GB.
// the case that this line solves 
// getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);

long defaultValueInBytes = getDefaultValue(OZONE_CONTAINER_SIZE_DEFAULT); // in bytes.
long valueInMB = BYTES.fromBytes(getLongBytes(OZONE_CONTAINER_SIZE, defaultValueInBytes)).toMBs();{code}
 

Now imagine repeating this code many times all over the code base, plus, the BYTES.fromBytes(xxx).ToMBs()
is a fuction from this patch. We need some equivalent code.

 

In other words, I submit that these following factors make getLongBytes a less desirable function
compared to getTimeDuration/getStorageSize.
 * lack of symetric function in getLongBytes - Without a set function it degenerates to a
messy set of multiplication and division each time we have to use a storage Unit. With this
those issue are cleanly isolated to a single place.
 * Lack of a formal storage unit - lack of a formal value like to TimeUnit/StorageUnit makes
the code less readable (see the example setStorageSize) where the context also tells you the
unit that we are operating with.
 *  Does not suit our usage pattern - Ozone code follows the patterns in HDFS. Hence the
default value mapping is not handled well in getLongBytes.

 * Units and conversion is needed - In ozone, there are several places where we convert these
numbers. Users can specify Quota as a easy to read Storage Units, like 5 GB or 10 TB. We have
a dedicated code for handling that, we use the storage numbers to specify how large the off-heap
size should be, or as I am showing in this example, container sizes etc.
 * The getLongBytes by itself does not address the lack of StorageUnits, what this patch does
is that it introduces a class that is very similar to {{TimeUnit}}. This makes the code more
readable and easy to maintain.

 

 


was (Author: anu):
[~stevel@apache.org] I did look at getLongBytes. The issue is that it does not provide the
same code improvement as {{getTimeDuration}} and {{setTimeDuration}}. 

Let me support my assertion with some examples that you can look at.

Let us suppose that I have a case to read the standard configured size of containers. This
is a configurable parameter in Ozone.

Here is how the code would look like with get/setStorageSize.
{code:java}
setStorageSize(OZONE_CONTAINER_SIZE, 1, GIGABYTES);
// let us suppose we want to read back this config value as MBs.
long valueInMB = getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);{code}
now let us see how we write this code using getLongBytes..
{code:java}
// there is no symetric function called setLongBytes.. So I have to fall back to setLong
// 
// Here is my first attempt.
setLong(OZONE_CONTAINER_SIZE, 1048576); 
// This looks bad, since I am not sure if the OZONE_CONTAINER_SIZE is in MB/GB or what, 
// So the many keys get tagged as OZONE_CONTAINER_SIZE_GB

// Now second attempt.
setLong(OZONE_CONTAINER_SIZE_GB, 1048576); 
// But this is bad too, now I can not set container size in MB, since setLong takes a
// whole number and not a fraction.
// So now third attempt -- convert all fields to bytes
setLong(OZONE_CONTAINER_SIZE_BYTES, 5368709120); // The default is 5 GG.
{code}
Before you think this is a made up example, this is part of changes that we tried which triggered
this change.

Now let us go to back get examples --
{code:java}

// getLongBytes forces us to write code in a certain way which does not match with
// rest of code like getTimeDuration. For example, if I have a case where I want to read //
the read the value in MB, and the ozone-default.xml is configured to GB.
// the case that this line solves 
// getStorageSize(OZONE_CONTAINER_SIZE, "5 GB", MEGABYTES);

long defaultValueInBytes = getDefaultValue(OZONE_CONTAINER_SIZE_DEFAULT); // in bytes.
long valueInMB = BYTES.fromBytes(getLongBytes(OZONE_CONTAINER_SIZE, defaultValueInBytes)).toMBs();{code}
 

Now imagine repeating this code many times all over the code base, plus, the BYTES.fromBytes(xxx).ToMBs()
is a fuction from this patch. We need some equivalent code.

 

In other words, I submit that these following factors make getLongBytes a less desirable function
compared to getTimeDuration/getStorageSize.
 * lack of symetric function in getLongBytes - Without a set function it degenerates to a
messy set of multiplication and division each time we have to use a storage Unit. With this
those issue are cleanly isolated to a single place.
 * Lack of a format storage unit - lack of a formal value like to TimeUnit/StorageUnit makes
the code less readable (see the example setStorageSize) where the context also tells you the
unit that we are operating with. 
 *  Does not suit our usage pattern - Ozone code follows the patterns in HDFS. Hence the
default value mapping is not handled well in getLongBytes. 

 * Units and Conversation is needed - In ozone, there are several places where we convert
these numbers. Users can specify Quota as a easy to read Storage Units, like 5 GB or 10 TB.
We have a dedicated code for handling that, we use the storage numbers to specify how large
the off-heap size should be, or as I am showing in this example, container sizes etc.
 * The getLongBytes by itself does not address the lack of StorageUnits, what this patch does
is that it introduces a class that is very similar to {{TimeUnit}}. This makes the code more
readable and easy to maintain.

 

 

> Add Configuration API for parsing storage sizes
> -----------------------------------------------
>
>                 Key: HADOOP-15204
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15204
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: conf
>    Affects Versions: 3.1.0
>            Reporter: Anu Engineer
>            Assignee: Anu Engineer
>            Priority: Minor
>             Fix For: 3.1.0
>
>         Attachments: HADOOP-15204.001.patch
>
>
> Hadoop has a lot of configurations that specify memory and disk size. This JIRA proposes
to add an API like {{Configuration.getStorageSize}} which will allow users
> to specify units like KB, MB, GB etc. This is JIRA is inspired by HDFS-8608 and Ozone.
Adding {{getTimeDuration}} support was a great improvement for ozone code base, this JIRA
hopes to do the same thing for configs that deal with disk and memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message