hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Olson (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-16900) Very large files can be truncated when written through S3AFileSystem
Date Wed, 04 Mar 2020 15:36:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-16900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051330#comment-17051330
] 

Andrew Olson commented on HADOOP-16900:
---------------------------------------

[~stevel@apache.org] An adaptive solution like that sounds good to me. Yes depending on the fs.s3a.fast.upload.buffer
and related configurations some moderate amount of additional resources could be required
but it should be successful more often that not.

Since DistCp does have that source vs target length check, I'm not sure how our DistCp job
still managed to succeed when this happened. When I have some time I'll investigate that further
to try to clear up the mystery. For what it's worth we were using the -strategy dynamic option.

> Very large files can be truncated when written through S3AFileSystem
> --------------------------------------------------------------------
>
>                 Key: HADOOP-16900
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16900
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 3.2.1
>            Reporter: Andrew Olson
>            Assignee: Steve Loughran
>            Priority: Major
>              Labels: s3
>
> If a written file size exceeds 10,000 * {{fs.s3a.multipart.size}}, a corrupt truncation
of the S3 object will occur as the maximum number of parts in a multipart upload is 10,000
as specific by the S3 API and there is an apparent bug where this failure is not fatal, and
the multipart upload is allowed to be marked as completed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message