spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniket Bhatnagar <aniket.bhatna...@gmail.com>
Subject Re: Programmatic Spark 1.2.0 on EMR | S3 filesystem is not working when using
Date Fri, 30 Jan 2015 17:59:27 GMT
Right. Which makes me to believe that the directory is perhaps configured
somewhere and i have missed configuring the same. The process that is
submitting jobs (basically becomes driver) is running in sudo mode and the
executors are executed by YARN. The hadoop username is configured as
'hadoop' (default user in EMR).

On Fri, Jan 30, 2015, 11:25 PM Sven Krasser <krasser@gmail.com> wrote:

> From your stacktrace it appears that the S3 writer tries to write the data
> to a temp file on the local file system first. Taking a guess, that local
> directory doesn't exist or you don't have permissions for it.
> -Sven
>
> On Fri, Jan 30, 2015 at 6:44 AM, Aniket Bhatnagar <
> aniket.bhatnagar@gmail.com> wrote:
>
>> I am programmatically submit spark jobs in yarn-client mode on EMR.
>> Whenever a job tries to save file to s3, it gives the below mentioned
>> exception. I think the issue might be what EMR is not setup properly as I
>> have to set all hadoop configurations manually in SparkContext. However, I
>> am not sure which configuration am I missing (if any).
>>
>> Configurations that I am using in SparkContext to setup EMRFS:
>> "spark.hadoop.fs.s3n.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
>> "spark.hadoop.fs.s3.impl": "com.amazon.ws.emr.hadoop.fs.EmrFileSystem",
>> "spark.hadoop.fs.emr.configuration.version": "1.0",
>> "spark.hadoop.fs.s3n.multipart.uploads.enabled": "true",
>> "spark.hadoop.fs.s3.enableServerSideEncryption": "false",
>> "spark.hadoop.fs.s3.serverSideEncryptionAlgorithm": "AES256",
>> "spark.hadoop.fs.s3.consistent": "true",
>> "spark.hadoop.fs.s3.consistent.retryPolicyType": "exponential",
>> "spark.hadoop.fs.s3.consistent.retryPeriodSeconds": "10",
>> "spark.hadoop.fs.s3.consistent.retryCount": "5",
>> "spark.hadoop.fs.s3.maxRetries": "4",
>> "spark.hadoop.fs.s3.sleepTimeSeconds": "10",
>> "spark.hadoop.fs.s3.consistent.throwExceptionOnInconsistency": "true",
>> "spark.hadoop.fs.s3.consistent.metadata.autoCreate": "true",
>> "spark.hadoop.fs.s3.consistent.metadata.tableName": "EmrFSMetadata",
>> "spark.hadoop.fs.s3.consistent.metadata.read.capacity": "500",
>> "spark.hadoop.fs.s3.consistent.metadata.write.capacity": "100",
>> "spark.hadoop.fs.s3.consistent.fastList": "true",
>> "spark.hadoop.fs.s3.consistent.fastList.prefetchMetadata": "false",
>> "spark.hadoop.fs.s3.consistent.notification.CloudWatch": "false",
>> "spark.hadoop.fs.s3.consistent.notification.SQS": "false",
>>
>> Exception:
>> java.io.IOException: No such file or directory
>> at java.io.UnixFileSystem.createFileExclusively(Native Method)
>> at java.io.File.createNewFile(File.java:1006)
>> at java.io.File.createTempFile(File.java:1989)
>> at
>> com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.startNewTempFile(S3FSOutputStream.java:269)
>> at
>> com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.writeInternal(S3FSOutputStream.java:205)
>> at
>> com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.flush(S3FSOutputStream.java:136)
>> at
>> com.amazon.ws.emr.hadoop.fs.s3.S3FSOutputStream.close(S3FSOutputStream.java:156)
>> at
>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
>> at
>> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:105)
>> at
>> org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:109)
>> at
>> org.apache.hadoop.mapred.lib.MultipleOutputFormat$1.close(MultipleOutputFormat.java:116)
>> at org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1068)
>> at
>> org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:1047)
>> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>> at org.apache.spark.scheduler.Task.run(Task.scala:56)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:745)
>>
>> Hints? Suggestions?
>>
>
>
>
> --
> http://sites.google.com/site/krasser/?utm_source=sig
>

Mime
View raw message