spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Bhatnagar <aniket.bhatna...@gmail.com>
Subject Programmatic Spark 1.2.0 on EMR | S3 filesystem is not working when using
Date Fri, 30 Jan 2015 14:44:38 GMT
I am programmatically submit spark jobs in yarn-client mode on EMR.
Whenever a job tries to save file to s3, it gives the below mentioned
exception. I think the issue might be what EMR is not setup properly as I
have to set all hadoop configurations manually in SparkContext. However, I
am not sure which configuration am I missing (if any).

Configurations that I am using in SparkContext to setup EMRFS:
"spark.hadoop.fs.s3n.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
"spark.hadoop.fs.s3.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
"spark.hadoop.fs.emr.configuration.version": "1.0",
"spark.hadoop.fs.s3n.multipart.uploads.enabled": "true",
"spark.hadoop.fs.s3.enableServerSideEncryption": "false",
"spark.hadoop.fs.s3.serverSideEncryptionAlgorithm": "AES256",
"spark.hadoop.fs.s3.consistent": "true",
"spark.hadoop.fs.s3.consistent.retryPolicyType": "exponential",
"spark.hadoop.fs.s3.consistent.retryPeriodSeconds": "10",
"spark.hadoop.fs.s3.consistent.retryCount": "5",
"spark.hadoop.fs.s3.maxRetries": "4",
"spark.hadoop.fs.s3.sleepTimeSeconds": "10",
"spark.hadoop.fs.s3.consistent.throwExceptionOnInconsistency": "true",
"spark.hadoop.fs.s3.consistent.metadata.autoCreate": "true",
"spark.hadoop.fs.s3.consistent.metadata.tableName": "EmrFSMetadata",
"spark.hadoop.fs.s3.consistent.metadata.read.capacity": "500",
"spark.hadoop.fs.s3.consistent.metadata.write.capacity": "100",
"spark.hadoop.fs.s3.consistent.fastList": "true",
"spark.hadoop.fs.s3.consistent.fastList.prefetchMetadata": "false",
"spark.hadoop.fs.s3.consistent.notification.CloudWatch": "false",
"spark.hadoop.fs.s3.consistent.notification.SQS": "false",

Exception:
java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1006)
at java.io.File.createTempFile(File.java:1989)
at
com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.startNewTempFile(S3FSOutputStream.java:269)
at
com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.writeInternal(S3FSOutputStream.java:205)
at
com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.flush(S3FSOutputStream.java:136)
at
com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.close(S3FSOutputStream.java:156)
at
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
at
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
at
org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:109)
at
org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.close(MultipleOutputFormat.java:116)
at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
at
org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

Hints? Suggestions?

Mime
View raw message