hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests
Date Wed, 11 May 2016 10:27:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15279905#comment-15279905
] 

Steve Loughran commented on HADOOP-13028:
-----------------------------------------

Because one place I'm using this to look at the logs and see how to tune the performance is
in spark code which doesn't have access to those internals and is built against Hadoop 2.6.x
anyway. It lets me have code which can be run with -Dhadoop.version=2.7.1 and -Dhadoop.version=2.8.0-SNAPSHOT,
I can not only measure the duration in the spark code itself, I can see the logged info and
see what's been happening —where things can be improved futher.

We cannot do this if the way to log this data is via a class which is package private and
in Hadoop 2.8+ only. As requested, I've scoped that statistics class so that the only way
to get at it is to inject code into the org.apache.hadoop.fs.s3a package. Do you really, really,
want me to do that in spark code? And use introspection to get at a class it can't compile
against.

Please, give me the string: it'll be better for all of us. As and when your colleagues sit
down to look at Parquet performance on S3, they'll appreciate it.

> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>
>                 Key: HADOOP-13028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13028
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, metrics
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, HADOOP-13028-004.patch,
HADOOP-13028-005.patch, HADOOP-13028-006.patch, HADOOP-13028-007.patch, HADOOP-13028-008.patch,
HADOOP-13028-009.patch, HADOOP-13028-branch-2-008.patch, HADOOP-13028-branch-2-009.patch,
HADOOP-13028-branch-2-010.patch, HADOOP-13028-branch-2-011.patch, org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt
>
>
> against S3 (and other object stores), opening connections can be expensive, closing connections
may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of open/close/failure+reconnect
operations, timers of how long things take. This can be used downstream to measure efficiency
of the code (how often connections are being made), connection reliability, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message