hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13028) add low level counter metrics for S3A; use in read performance tests
Date Mon, 02 May 2016 23:31:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15267729#comment-15267729

Chris Nauroth commented on HADOOP-13028:

[~stevel@apache.org], I've spent more time reading the seek code changes, and I'm pretty confident
that they're correct overall, but I have a few more comments.

# {{S3AInputStream#closeStream}} has the following log message.  The text of the message indicates
that it's logging {{contentLength}}, but really it's logging {{length}}.  I imagine {{length}}
is really the more interesting thing here, and the message text should be changed?
      LOG.debug("Stream {} {}: {}; streamPos={}, nextReadPos={}," +
          " contentLength={}",
          uri, (shouldAbort ? "aborted":"closed"), reason, pos, nextReadPos,
# Actually, that makes me realize I am unclear about a change made in HADOOP-12444.  {{S3AInputStream#reopen}}
has a stream length calculation that gets passed into the range request.
    requestedStreamLen = (length < 0) ? this.contentLength :
        Math.max(this.contentLength, (CLOSE_THRESHOLD + (targetPos + length)));
    GetObjectRequest request = new GetObjectRequest(bucket, key)
        .withRange(targetPos, requestedStreamLen);
Please tell me if I'm misunderstanding something, but I believe this calculation always results
in an upper bound on the range that effectively means "get the whole thing."  That {{Math.max}}
call guarantees that the value is always at least {{contentLength}}, which is the whole file
length.  Is this a bug in the HADOOP-12444 patch?
# {{InputStreamStatistics#seekBackwards}} accepts {{offset}} as an argument but doesn't use
it.  Is there supposed to be another counter for back-skipped bytes?  At the call site within
{{S3AInputStream#seekInStream}}, the value it passes would be negative, so we'd need to be
careful of that.

> add low level counter metrics for S3A; use in read performance tests
> --------------------------------------------------------------------
>                 Key: HADOOP-13028
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13028
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, metrics
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13028-001.patch, HADOOP-13028-002.patch, HADOOP-13028-004.patch,
HADOOP-13028-005.patch, HADOOP-13028-006.patch, HADOOP-13028-007.patch, HADOOP-13028-008.patch,
HADOOP-13028-branch-2-008.patch, org.apache.hadoop.fs.s3a.scale.TestS3AInputStreamPerformance-output.txt,
> against S3 (and other object stores), opening connections can be expensive, closing connections
may be expensive (a sign of a regression). 
> S3A FS and individual input streams should have counters of the # of open/close/failure+reconnect
operations, timers of how long things take. This can be used downstream to measure efficiency
of the code (how often connections are being made), connection reliability, etc.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message