hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13065) Add a new interface for retrieving FS and FC Statistics
Date Fri, 06 May 2016 03:48:12 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15273605#comment-15273605
] 

Colin Patrick McCabe commented on HADOOP-13065:
-----------------------------------------------

Thanks for the reviews.

bq. in FileSystem.getStatistics(), For performance, you could try using ConcurrentMap for
the map, and only if it is not present create the objects and call putIfAbsent() (or a synchronized
block create and update the maps (with a second lookup there to eliminate the small race condition).
This will eliminate the sync point on a simple lookup when the entry exists.

Hmm.  I don't think that we really need to optimize this function.  When using the new API,
the only time this function gets called is when a new FileSystem object is created, which
should be very rare.

bq. For testing a may to reset/remove an entry could be handy.

We do have some tests that zero out the existing statistics objects.  I'm not sure if removing
the entry really gets us more coverage than we have now, since we know that it was created
by this code path (therefore the code path was tested).

bq. That's said, we can firstly deprecate the FileSystem#getStatistics()?

Agree.

> Add a new interface for retrieving FS and FC Statistics
> -------------------------------------------------------
>
>                 Key: HADOOP-13065
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13065
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Ram Venkatesh
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13065-007.patch, HADOOP-13065.008.patch, HADOOP-13065.009.patch,
HDFS-10175.000.patch, HDFS-10175.001.patch, HDFS-10175.002.patch, HDFS-10175.003.patch, HDFS-10175.004.patch,
HDFS-10175.005.patch, HDFS-10175.006.patch, TestStatisticsOverhead.java
>
>
> Currently FileSystem.Statistics exposes the following statistics:
> BytesRead
> BytesWritten
> ReadOps
> LargeReadOps
> WriteOps
> These are in-turn exposed as job counters by MapReduce and other frameworks. There is
logic within DfsClient to map operations to these counters that can be confusing, for instance,
mkdirs counts as a writeOp.
> Proposed enhancement:
> Add a statistic for each DfsClient operation including create, append, createSymlink,
delete, exists, mkdirs, rename and expose them as new properties on the Statistics object.
The operation-specific counters can be used for analyzing the load imposed by a particular
job on HDFS. 
> For example, we can use them to identify jobs that end up creating a large number of
files.
> Once this information is available in the Statistics object, the app frameworks like
MapReduce can expose them as additional counters to be aggregated and recorded as part of
job summary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message