hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics
Date Wed, 04 May 2016 11:29:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270492#comment-15270492

Steve Loughran commented on HADOOP-13065:

I like this patch, especially the {{isTracked()}} probe.

h3. in {{FileSystem.getStatistics()}}

# For performance, you could try using {{ConcurrentMap}} for the map, and only if it is not
present create the objects and call putIfAbsent() (or a synchronized block create and update
the maps (with a second lookup there to eliminate the small race condition). This will eliminate
the sync point on a simple lookup when the entry exists. 
# For testing a may to reset/remove an entry could be handy.

h3. In {{testConcurrentStatistics()}}

in the runnables, line 737, there's a {{fail("Child failed with exception: " + t)}}

# tests shouldn't lose the inner stack. Just let it pass through
# and, as it will fail in a separate thread, isn't going to fail the test anyway, as far as
I can tell

Better to catch, store in a list of exceptions caught, and, once the {{allDone.await()}} checkpoint
is reached, look at that list, if non-empty log all exceptions then throw the first one. That
will promote it to a failure on the test thread.

> Add a new interface for retrieving FS and FC Statistics
> -------------------------------------------------------
>                 Key: HADOOP-13065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13065
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, HDFS-10175.000.patch,
HDFS-10175.001.patch, HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch, HDFS-10175.005.patch,
HDFS-10175.006.patch, TestStatisticsOverhead.java
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. There is
logic within DfsClient to map operations to these counters that can be confusing, for instance,
mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, createSymlink,
delete, exists, mkdirs, rename and expose them as new properties on the Statistics object.
The operation-specific counters can be used for analyzing the load imposed by a particular
job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large number of
> Once this information is available in the Statistics object, the app frameworks like
MapReduce can expose them as additional counters to be aggregated and recorded as part of
job summary.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message