hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics
Date Fri, 06 May 2016 19:37:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274617#comment-15274617
] 

Colin Patrick McCabe commented on HADOOP-13065:
-----------------------------------------------

bq. One quick question is that, some of the storage statistics classes (e.g. GlobalStorageStatistics
are annotated as Stable, do we have to be a bit more conservative by making them Unstable
before ultimately removing the Statistics?

Good question.  I think that what would happen is that the old API would become deprecated
in branch-2, and removed in branch-3.  There isn't any need to change the annotation since
we don't plan to modify the interface, just remove it.

bq. As follow-on work, 1. We can move the rack-awareness read bytes to a separate storage
statistics as it's only used by HDFS, and 2. We can remove Statistics API, but keep the thread
local implementation in FileSystemStorageStatistics class.

That makes sense.  One thing that we've talked about doing in the past is moving these statistics
to a separate java file, so that they could be used in both FileContext and FileSystem.  Maybe
we could call them something like ThreadLocalFsStatistics or something?

> Add a new interface for retrieving FS and FC Statistics
> -------------------------------------------------------
>
>                 Key: HADOOP-13065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13065
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, HADOOP-13065.009.patch,
HADOOP-13065.010.patch, HDFS-10175.000.patch, HDFS-10175.001.patch, HDFS-10175.002.patch,
HDFS-10175.003.patch, HDFS-10175.004.patch, HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. There is
logic within DfsClient to map operations to these counters that can be confusing, for instance,
mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, createSymlink,
delete, exists, mkdirs, rename and expose them as new properties on the Statistics object.
The operation-specific counters can be used for analyzing the load imposed by a particular
job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large number of
files.
> Once this information is available in the Statistics object, the app frameworks like
MapReduce can expose them as additional counters to be aggregated and recorded as part of
job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message