hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5958) Use JDK 1.6 File APIs in DF.java wherever possible
Date Sat, 21 Nov 2009 03:49:40 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Chris Douglas updated HADOOP-5958:

    Fix Version/s:     (was: 0.21.0)
           Status: Open  (was: Patch Available)

bq. what happens when you run that same benchmark against an NFS drive?

Every class using DF assumes- reasonably- that the resource behaves like a local drive. Configuring
e.g. LocalDirAllocator or FSDataset to use a remote FS and then worrying that DF might take
milliseconds instead of microseconds is focusing on the noise.

I don't understand the virtue of the current approach, as DF seems like a fixed set of functionality
not meriting an abstract class, factory, etc. Is PosixDF needed for anything but getMount
and getFilesystem? The latter has no callers and the former has one that creates an instance
and discards all but the mount. This suggests two approaches:
# If nearly all uses of DF involve delegating to a java.io.File, is there any reason not to
simply replace uses of DF with File? Everywhere DF is used, a local FS is assumed. As Steve
points out, if this were otherwise, other designs would be preferred.
# If DF retained Shell as a subtype and used the java.io.File methods where appropriate, is
the cost really prohibitive? Few of these are created and other than a faster implementation
of most calls, nothing else changes. The costs incurred for keeping everything intact appear

Neither of these is an incompatible change, assuming java.io.File is correctly implemented.
They're not even mutually exclusive.

A few nits:
* Incompatibly moving {{DF::main}} seems unnecessary
* The comment on {{DF_INTERVAL_DEFAULT}} should be javadoc
* While the original didn't have it either, DF methods should have javadoc
* In {{getPercentUsed}}, using {{cap}} in the calculation of {{used}} avoids the second call
to {{getCapacity}}
* If the current design is retained (because some architecture has a faulty java.io.File impl?),
it should be possible to use PosixDF exclusively using the config passed to {{DF::getDF}}
(could be named {{DF::get}}).
* The current patch also requires changes to HDFS that must be committed with these. If retained,
please open an issue and link
* {{getFilesystem}} and {{getMounts}} should probably be deprecated, even removed from DF
since one needs to explicitly instantiate PosixDF to make the call. The only reason {{getMounts}}
is there is because that's the command it's scraped from, anyway.

> Use JDK 1.6 File APIs in DF.java wherever possible
> --------------------------------------------------
>                 Key: HADOOP-5958
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5958
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Devaraj Das
>            Assignee: Aaron Kimball
>             Fix For: 0.22.0
>         Attachments: HADOOP-5958-hdfs.patch, HADOOP-5958-mapred.patch, HADOOP-5958.2.patch,
HADOOP-5958.3.patch, HADOOP-5958.4.patch, HADOOP-5958.patch
> JDK 1.6 has File APIs like File.getFreeSpace() which should be used instead of spawning
a command process for getting the various disk/partition related attributes. This would avoid
spikes in memory consumption by tasks when things like LocalDirAllocator is used for creating
paths on the filesystem.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message