spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabor Somogyi <gabor.g.somo...@gmail.com>
Subject Re: Spark interrupts S3 request backoff
Date Tue, 14 Apr 2020 13:15:12 GMT
+1 on the previous guess and additionally I suggest to reproduce it with
vanilla Spark.
Amazon Spark contains modifications which not available in vanilla Spark
which makes problem hunting hard or impossible.
Such case amazon can help...

On Tue, Apr 14, 2020 at 11:20 AM ZHANG Wei <wezhang@outlook.com> wrote:

> I will make a guess, it's not interruptted, it's killed by the driver or
> the resource manager since the executor fallen into sleep for a long time.
>
> You may have to find the root cause in the driver and failed executor log
> contexts.
>
> --
> Cheers,
> -z
>
> ________________________________________
> From: Lian Jiang <jiangok2006@gmail.com>
> Sent: Monday, April 13, 2020 10:43
> To: user
> Subject: Spark interrupts S3 request backoff
>
> Hi,
>
> My Spark job failed when reading parquet files from S3 due to 503 slow
> down. According to
> https://docs.aws.amazon.com/AmazonS3/latest/dev/optimizing-performance.html,
> I can use backoff to mitigate this issue. However, spark seems to interrupt
> the backoff sleeping (see "sleep interrupted"). Is there a way (e.g. some
> settings) to make spark not interrupt the backoff? Appreciate any hints.
>
>
>
> 20/04/12 20:15:37 WARN TaskSetManager: Lost task 3347.0 in stage 155.0
> (TID 128138, ip-100-101-44-35.us-west-2.compute.internal, executor 34):
> org.apache.spark.sql.execution.datasources.FileDownloadException: Failed to
> download file path:
> s3://mybucket/myprefix/part-00178-d0a0d51f-f98e-4b9d-8d00-bb3b9acd9a47-c000.snappy.parquet,
> range: 0-19231, partition values: [empty row], isDataPresent: false
>         at
> org.apache.spark.sql.execution.datasources.AsyncFileDownloader.next(AsyncFileDownloader.scala:142)
>         at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.getNextFile(FileScanRDD.scala:248)
>         at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:172)
>         at
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:130)
>         at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
> Source)
>         at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(Unknown
> Source)
>         at
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
> Source)
>         at
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>         at
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>         at
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55)
>         at org.apache.spark.scheduler.Task.run(Task.scala:123)
>         at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>         at
> org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
>         Suppressed:
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.model.AmazonS3Exception:
> Slow Down (Service: Amazon S3; Status Code: 503; Error Code: 503 Slow Down;
> Request ID: CECE220993AE7F89; S3 Extended Request ID:
> UlQe4dEuBR1YWJUthSlrbV9phyqxUNHQEw7tsJ5zu+oNIH+nGlGHfAv7EKkQRUVP8tw8x918A4Y=),
> S3 Extended Request ID:
> UlQe4dEuBR1YWJUthSlrbV9phyqxUNHQEw7tsJ5zu+oNIH+nGlGHfAv7EKkQRUVP8tw8x918A4Y=
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4926)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4872)
>                 at
> com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1321)
>                 at
> com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:22)
>                 at
> com.amazon.ws.emr.hadoop.fs.s3.lite.call.GetObjectMetadataCall.perform(GetObjectMetadataCall.java:8)
>                 at
> com.amazon.ws.emr.hadoop.fs.s3.lite.executor.GlobalS3Executor.execute(GlobalS3Executor.java:110)
>                 at
> com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:189)
>                 at
> com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.invoke(AmazonS3LiteClient.java:184)
>                 at
> com.amazon.ws.emr.hadoop.fs.s3.lite.AmazonS3LiteClient.getObjectMetadata(AmazonS3LiteClient.java:96)
>                 at
> com.amazon.ws.emr.hadoop.fs.s3.lite.AbstractAmazonS3Lite.getObjectMetadata(AbstractAmazonS3Lite.java:43)
>                 at
> com.amazon.ws.emr.hadoop.fs.s3n.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:220)
>                 at
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:860)
>                 at
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:553)
>                 at
> org.apache.parquet.hadoop.util.HadoopInputFile.fromPath(HadoopInputFile.java:39)
>                 at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448)
>                 at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildPrefetcherWithPartitionValues$1.apply(ParquetFileFormat.scala:595)
>                 at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildPrefetcherWithPartitionValues$1.apply(ParquetFileFormat.scala:578)
>                 at
> org.apache.spark.sql.execution.datasources.AsyncFileDownloader.org<
> http://org.apache.spark.sql.execution.datasources.AsyncFileDownloader.org
> >$apache$spark$sql$execution$datasources$AsyncFileDownloader$$downloadFile(AsyncFileDownloader.scala:93)
>                 at
> org.apache.spark.sql.execution.datasources.AsyncFileDownloader$$anonfun$initiateFilesDownload$2$$anon$1.call(AsyncFileDownloader.scala:73)
>                 at
> org.apache.spark.sql.execution.datasources.AsyncFileDownloader$$anonfun$initiateFilesDownload$2$$anon$1.call(AsyncFileDownloader.scala:72)
>                 at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>                 ... 3 more
> Caused by: java.lang.RuntimeException: Retry's backoff was interrupted by
> other process
>         at
> com.amazon.ws.emr.hadoop.fs.util.EmrFsUtils.sleep(EmrFsUtils.java:376)
>         at
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem$NativeS3FsInputStream.read(S3NativeFileSystem.java:287)
>         at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
>         at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
>         at java.io.FilterInputStream.read(FilterInputStream.java:83)
>         at org.apache.parquet.io
> .DelegatingSeekableInputStream.read(DelegatingSeekableInputStream.java:61)
>         at
> org.apache.parquet.bytes.BytesUtils.readIntLittleEndian(BytesUtils.java:80)
>         at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:520)
>         at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
>         at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
>         at
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:448)
>         at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildPrefetcherWithPartitionValues$1.apply(ParquetFileFormat.scala:595)
>         at
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildPrefetcherWithPartitionValues$1.apply(ParquetFileFormat.scala:578)
>         at
> org.apache.spark.sql.execution.datasources.AsyncFileDownloader.org<
> http://org.apache.spark.sql.execution.datasources.AsyncFileDownloader.org
> >$apache$spark$sql$execution$datasources$AsyncFileDownloader$$downloadFile(AsyncFileDownloader.scala:93)
>         at
> org.apache.spark.sql.execution.datasources.AsyncFileDownloader$$anonfun$initiateFilesDownload$2$$anon$1.call(AsyncFileDownloader.scala:73)
>         at
> org.apache.spark.sql.execution.datasources.AsyncFileDownloader$$anonfun$initiateFilesDownload$2$$anon$1.call(AsyncFileDownloader.scala:72)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         ... 3 more
> Caused by: java.lang.InterruptedException: sleep interrupted
>         at java.lang.Thread.sleep(Native Method)
>         at
> com.amazon.ws.emr.hadoop.fs.util.EmrFsUtils.sleep(EmrFsUtils.java:374)
>         ... 19 more
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Mime
View raw message