flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pawel Bartoszek (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-10664) Flink: Checkpointing fails with S3 exception - Please reduce your request rate
Date Tue, 30 Oct 2018 11:54:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-10664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16668610#comment-16668610
] 

Pawel Bartoszek commented on FLINK-10664:
-----------------------------------------

[~StephanEwen] I looked into flink-s3-fs-presto source code and see that org.apache.flink.fs.s3presto.S3FileSystemFactory
extends AbstractS3FileSystemFactory ([https://github.com/apache/flink/blob/master/flink-filesystems/flink-s3-fs-base/src/main/java/org/apache/flink/fs/s3/common/AbstractS3FileSystemFactory.java])
which internally uses Hadoop classes. How it's different than from using Hadoop FS?

 

> Flink: Checkpointing fails with S3 exception - Please reduce your request rate
> ------------------------------------------------------------------------------
>
>                 Key: FLINK-10664
>                 URL: https://issues.apache.org/jira/browse/FLINK-10664
>             Project: Flink
>          Issue Type: Improvement
>          Components: JobManager, TaskManager
>    Affects Versions: 1.5.4, 1.6.1
>            Reporter: Pawel Bartoszek
>            Priority: Major
>
> When the checkpoint is created for the job which has many operators it could happen that
Flink uploads too many checkpoint files, at the same time, to S3 resulting in throttling
from S3. 
>  
> {code:java}
> Caused by: org.apache.hadoop.fs.s3a.AWSS3IOException: saving output on flink/state-checkpoints/7bbd6495f90257e4bc037ecc08ba21a5/chk-19/4422b088-0836-4f12-bbbe-7e731da11231:
com.amazonaws.services.s3.model.AmazonS3Exception: Please reduce your request rate. (Service:
Amazon S3; Status Code: 503; Error Code: SlowDown; Request ID: XXXX; S3 Extended Request ID:
XXX), S3 Extended Request ID: XXX: Please reduce your request rate. (Service: Amazon S3; Status
Code: 503; Error Code: SlowDown; Request ID: 5310EA750DF8B949; S3 Extended Request ID: XXX)
> at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:178)
> at org.apache.hadoop.fs.s3a.S3AOutputStream.close(S3AOutputStream.java:121)
> at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:74)
> at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:108)
> at org.apache.flink.runtime.fs.hdfs.HadoopDataOutputStream.close(HadoopDataOutputStream.java:52)
> at org.apache.flink.core.fs.ClosingFSDataOutputStream.close(ClosingFSDataOutputStream.java:64)
> at org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:311){code}
>  
> Can the upload be retried with kind of back off?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message