spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Everett Anderson <ever...@nuna.com.INVALID>
Subject S3A + EMR failure when writing Parquet?
Date Sun, 28 Aug 2016 19:51:50 GMT
Hi,

I'm having some trouble figuring out a failure when using S3A when writing
a DataFrame as Parquet on EMR 4.7.2 (which is Hadoop 2.7.2 and Spark
1.6.2). It works when using EMRFS (s3://), though.

I'm using these extra conf params, though I've also tried without
everything but the encryption one with the same result:

--conf spark.hadooop.mapreduce.fileoutputcommitter.algorithm.version=2
--conf spark.hadoop.mapreduce.fileoutputcommitter.cleanup.skipped=true
--conf spark.hadoop.fs.s3a.server-side-encryption-algorithm=AES256
--conf
spark.sql.parquet.output.committer.class=org.apache.spark.sql.parquet.DirectParquetOutputCommitter

It looks like it does actually write the parquet shards under

<output root S3>/_temporary/0/_temporary/<attempt>/

but then must hit that S3 exception when trying to copy/rename. I think the
NullPointerException deep down in Parquet is due to it causing close() more
than once so isn't the root cause, but I'm not sure.

Anyone seen something like this?

16/08/28 19:46:28 ERROR InsertIntoHadoopFsRelation: Aborting job.
org.apache.spark.SparkException: Job aborted due to stage failure:
Task 9 in stage 1.0 failed 4 times, most recent failure: Lost task 9.3
in stage 1.0 (TID 54, ip-10-8-38-103.us-west-2.compute.internal):
org.apache.spark.SparkException: Task failed while writing rows
	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:269)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)
	at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:148)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to commit task
	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$commitTask$1(WriterContainer.scala:283)
	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply$mcV$sp(WriterContainer.scala:265)
	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:260)
	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$1.apply(WriterContainer.scala:260)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1277)
	at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:266)
	... 8 more
	Suppressed: java.lang.NullPointerException
		at org.apache.parquet.hadoop.InternalParquetRecordWriter.flushRowGroupToStore(InternalParquetRecordWriter.java:147)
		at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:113)
		at org.apache.parquet.hadoop.ParquetRecordWriter.close(ParquetRecordWriter.java:112)
		at org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter.close(ParquetRelation.scala:101)
		at org.apache.spark.sql.execution.datasources.DefaultWriterContainer.org$apache$spark$sql$execution$datasources$DefaultWriterContainer$$abortTask$1(WriterContainer.scala:290)
		at org.apache.spark.sql.execution.datasources.DefaultWriterContainer$$anonfun$writeRows$2.apply$mcV$sp(WriterContainer.scala:266)
		at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1286)
		... 9 more
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Access
Denied (Service: Amazon S3; Status Code: 403; Error Code:
AccessDenied; Request ID: EA0E434768316935), S3 Extended Request ID:
fHtu7Q9VSi/8h0RAyfRiyK6uAJnajZBrwqZH3eBfF5kM13H6dDl006031NTwU/whyGu1uNqW1mI=
	at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1389)
	at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:902)
	at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:607)
	at com.amazonaws.http.AmazonHttpClient.doExecute(AmazonHttpClient.java:376)
	at com.amazonaws.http.AmazonHttpClient.executeWithTimer(AmazonHttpClient.java:338)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:287)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3826)
	at com.amazonaws.services.s3.AmazonS3Client.putObject(AmazonS3Client.java:1405)
	at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInOneChunk(UploadCallable.java:131)
	at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:123)
	at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:139)
	at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:47)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	... 3 more

Mime
View raw message