hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-16188) s3a rename failed during copy, "Unable to copy part" + 200 error code
Date Wed, 13 Mar 2019 20:35:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-16188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16792090#comment-16792090
] 

Steve Loughran commented on HADOOP-16188:
-----------------------------------------

https://github.com/aws/aws-sdk-java/issues/1141 -SDK doesn't retry on a 200, so caller has
to do that.

_S3 does this for potentially-long-running operations, to avoid timeouts. They send a 200
immediately after receiving the request, then send an occasional space character (at the start
of the response body) to keep the connection alive while they do the actual work, then send
the actual XML payload when the operation completes/fails._

What does that mean? 200 doesn't always mean success, and in translateException we need to
look for the internal error and retry. 

h3. currently copyFile is tagged retry/mixed, but I don't see that as true: there's a once()
around the call, only.

Proposed

* new AwsInternalError exception
* exception translation maps 200 + 500 + internal error into this.
* which we treat as retryable, always.


> s3a rename failed during copy, "Unable to copy part" + 200 error code
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-16188
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16188
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> Error during a rename where AWS S3 seems to have some internal error *which is not retried
and returns status code 200"
> {code}
> com.amazonaws.SdkClientException: Unable to copy part: We encountered an internal error.
Please try again. (Service: Amazon S3; Status Code: 200; Error Code: InternalError;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message