hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics
Date Thu, 05 May 2016 04:13:12 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mingliang Liu updated HADOOP-13065:
    Attachment: HADOOP-13065.009.patch

Thankw [~stevel@apache.org] very much for your comment. The v9 patch addresses your 2nd comment
on test (which is a very nice catch), checkstyle and findbugs warnings, along with simple
fix for the related test failures. Again, thanks [~cmccabe] for the new design.

As to your concurrent map comment, I shall update the patch later if I get your point. By
the map, I suppose we're talking about the {{GlobalStorageStatistics#map}}. My concern is
that, the {{FileSystem.getStatistics()}} itself is {{static}} and {{synchronized}}, and the
{{GlobalStorageStatistics#map}} will be looked up and updated iff there is no entry in the
{{FileSystem#statisticsTable}}. So basically if an entry exists, there should be a respective
entry in {{FileSystem#statisticsTable}}, and thus no look up is issued. Ideally as [~cmccabe]
suggested, we should remove the {{FileSystem#Statistics}} as a public interface. For that
I think we will refactor this part of code heavily.
That's said, we can firstly deprecate the {{FileSystem#getStatistics()}}?

> Add a new interface for retrieving FS and FC Statistics
> -------------------------------------------------------
>                 Key: HADOOP-13065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13065
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, HADOOP-13065.009.patch,
HDFS-10175.000.patch, HDFS-10175.001.patch, HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch,
HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. There is
logic within DfsClient to map operations to these counters that can be confusing, for instance,
mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, createSymlink,
delete, exists, mkdirs, rename and expose them as new properties on the Statistics object.
The operation-specific counters can be used for analyzing the load imposed by a particular
job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large number of
> Once this information is available in the Statistics object, the app frameworks like
MapReduce can expose them as additional counters to be aggregated and recorded as part of
job summary.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message