spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lian Jiang <jiangok2...@gmail.com>
Subject Spark interrupts S3 request backoff
Date Mon, 13 Apr 2020 02:43:07 GMT
Hi,

My Spark job failed when reading parquet files from S3 due to 503 slow
down. According to
https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html,
I can use backoff to mitigate this issue. However, spark seems to interrupt
the backoff sleeping (see "sleep interrupted"). Is there a way (e.g. some
settings) to make spark not interrupt the backoff? Appreciate any hints.


20/04/12 20:15:37 WARN TaskSetManager: Lost task 3347.0 in stage 155.0
(TID 128138, ip-100-101-44-35.us-west-2.compute.internal, executor
34): org.apache.spark.sql.execution.datasources.FileDownloadException:
Failed to download file path:
s3://mybucket/myprefix/part-00178-d0a0d51f-f98e-4b9d-8d00-bb3b9acd9a47-c000.snappy.parquet,
range: 0-19231, partition values: [empty row], isDataPresent: false
	at org.apache.spark.sql.execution.datasources.AsyncFileDownloader.next(AsyncFileDownloader.scala:142)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.getNextFile(FileScanRDD.scala:248)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:172)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:130)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown
Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	Suppressed: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception:
Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow
Down; Request ID: CECE220993AE7F89; S3 Extended Request ID:
UlQe4dEuBR1YWJUthSlrbV9phyqxUNHQEw7tsJ5zu+oNIH+nGlGHfAv7EKkQRUVP8tw8x918A4Y=),
S3 Extended Request ID:
UlQe4dEuBR1YWJUthSlrbV9phyqxUNHQEw7tsJ5zu+oNIH+nGlGHfAv7EKkQRUVP8tw8x918A4Y=
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4926)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4872)
		at com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1321)
		at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:22)
		at com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:8)
		at com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:110)
		at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:189)
		at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:184)
		at com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96)
		at com.amazon.ws.emr.hadoop.fs.s3.lite.AbstractAmazonS3Lite.getObjectMetadata(AbstractAmazonS3Lite.java:43)
		at com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:220)
		at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:860)
		at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
		at org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:39)
		at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448)
		at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildPrefetcherWithPartitionValues$1.apply(ParquetFileFormat.scala:595)
		at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildPrefetcherWithPartitionValues$1.apply(ParquetFileFormat.scala:578)
		at org.apache.spark.sql.execution.datasources.AsyncFileDownloader.org$apache$spark$sql$execution$datasources$AsyncFileDownloader$$downloadFile(AsyncFileDownloader.scala:93)
		at org.apache.spark.sql.execution.datasources.AsyncFileDownloader$$anonfun$initiateFilesDownload$2$$anon$1.call(AsyncFileDownloader.scala:73)
		at org.apache.spark.sql.execution.datasources.AsyncFileDownloader$$anonfun$initiateFilesDownload$2$$anon$1.call(AsyncFileDownloader.scala:72)
		at java.util.concurrent.FutureTask.run(FutureTask.java:266)
		... 3 more
Caused by: java.lang.RuntimeException: Retry's backoff was interrupted
by other process
	at com.amazon.ws.emr.hadoop.fs.util.EmrFsUtils.sleep(EmrFsUtils.java:376)
	at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem$NativeS3FsInputStream.read(S3NativeFileSystem.java:287)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at org.apache.parquet.io.DelegatingSeekableInputStream.read(DelegatingSeekableInputStream.java:61)
	at org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:80)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:520)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildPrefetcherWithPartitionValues$1.apply(ParquetFileFormat.scala:595)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildPrefetcherWithPartitionValues$1.apply(ParquetFileFormat.scala:578)
	at org.apache.spark.sql.execution.datasources.AsyncFileDownloader.org$apache$spark$sql$execution$datasources$AsyncFileDownloader$$downloadFile(AsyncFileDownloader.scala:93)
	at org.apache.spark.sql.execution.datasources.AsyncFileDownloader$$anonfun$initiateFilesDownload$2$$anon$1.call(AsyncFileDownloader.scala:73)
	at org.apache.spark.sql.execution.datasources.AsyncFileDownloader$$anonfun$initiateFilesDownload$2$$anon$1.call(AsyncFileDownloader.scala:72)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	... 3 more
Caused by: java.lang.InterruptedException: sleep interrupted
	at java.lang.Thread.sleep(Native Method)
	at com.amazon.ws.emr.hadoop.fs.util.EmrFsUtils.sleep(EmrFsUtils.java:374)
	... 19 more

Mime
View raw message