hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests
Date Wed, 11 May 2016 20:43:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15280782#comment-15280782
] 

Colin Patrick McCabe edited comment on HADOOP-13028 at 5/11/16 8:43 PM:
------------------------------------------------------------------------

In the past I've written code for Spark that used reflection to make use of APIs that may
or may not be present in Hadoop.  HBase often does this as well, so that it can use multiple
versions of Hadoop.  It seems like this wouldn't be a lot of code.  Is that feasible in this
case?

I just find the argument that we should overload an existing unrelated API to output statistics
very off-putting.  I guess you could argue that the statistics is part of the stream state,
and toString is intended to reflect stream state.  But it will result in very long output
from toString which probably isn't what most existing callers want.  And it's not consistent
with the way any other hadoop streams work, including other s3 ones like s3n.

[~andrew.wang], [~cnauroth], [~liuml07], what do you think about this?  Is it acceptable to
overload {{toString}} in this way, to output statistics?  The argument seems to be that this
easier than using reflection to get the actual stream statistics object.


was (Author: cmccabe):
In the past I've written code for Spark that used reflection to make use of APIs that may
or may not be present in Hadoop.  HBase often does this as well, so that it can use multiple
versions of Hadoop.  It seems like this wouldn't be a lot of code.  Is that feasible in this
case?

I just find the argument that we should overload an existing unrelated API to output statistics
very off-putting.  It's like saying we should override hashCode to output the number of times
the user called {{seek()}} on the stream.

I guess you could argue that the statistics is part of the stream state, and toString is intended
to reflect stream state.  But it will result in very long output from toString which probably
isn't what most existing callers want.  And it's not consistent with the way any other hadoop
streams work, including other s3 ones like s3n.

> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
>                 Key: HADOOP-13028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13028
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, metrics
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, HADOOP-13028-004.patch,
HADOOP-13028-005.patch, HADOOP-13028-006.patch, HADOOP-13028-007.patch, HADOOP-13028-008.patch,
HADOOP-13028-009.patch, HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch,
HADOOP-13028-branch-2-010.patch, HADOOP-13028-branch-2-011.patch, org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, closing connections
may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of open/close/failure+reconnect
operations, timers of how long things take. This can be used downstream to measure efficiency
of the code (how often connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message