hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Kimball (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5958) Use JDK 1.6 File APIs in DF.java wherever possible
Date Mon, 09 Nov 2009 22:51:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775180#action_12775180

Aaron Kimball commented on HADOOP-5958:

|  DF.java:37 - "has most of the functionality, and has better performance" - makes sense
as a jira comment, but when that's the only implementation people may be left wondering "Better
than what?" Best to specifically compare with PosixDF


| Now that there are two getDFs, one with a conf and one without, shouldn't one either be
marked deprecated or private? I'd say we should leave the one that takes a Configuration and
just ignore the configuration variable, unless we're certain we'll never want Configuration
here again.

{{getDF()}} never existed before; I created those as a replacement for the {{DF()}} constructors,
now that {{DF}} itself is abstract. I'm ok with providing the Configuration-handling version

| We used to have a limit on how often df would be called. That's gone with the new implementation
- I dunno if the interval was due to the fork overhead or actually some overhead in the calls
themselves. Are the j.io.File implementations fast enough that we don't have to worry about
it, or should JavaDF do some caching?

I just ran a quick benchmark of calling {{File.getFreeSpace()}} a million times; an individual
call to {{f = new File(); f.getFreeSpace()}} takes on average 45 microseconds. By comparison,
forking the {{df}} executable takes 2.83 milliseconds. I don't think we need to worry about
caching in JavaDF.

> Use JDK 1.6 File APIs in DF.java wherever possible
> --------------------------------------------------
>                 Key: HADOOP-5958
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5958
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>            Reporter: Devaraj Das
>            Assignee: Aaron Kimball
>             Fix For: 0.21.0
>         Attachments: HADOOP-5958-hdfs.patch, HADOOP-5958-mapred.patch, HADOOP-5958.2.patch,
HADOOP-5958.3.patch, HADOOP-5958.patch
> JDK 1.6 has File APIs like File.getFreeSpace() which should be used instead of spawning
a command process for getting the various disk/partition related attributes. This would avoid
spikes in memory consumption by tasks when things like LocalDirAllocator is used for creating
paths on the filesystem.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message