spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Divya Gehlot <divya.htco...@gmail.com>
Subject Re: [Error:]while read s3 buckets in Spark 1.6 in spark -submit
Date Fri, 02 Sep 2016 07:36:53 GMT
Hi Steve,
I am trying to read it from S3n://"bucket" and already included aws-java-sdk
1.7.4 in my classpath .
My machine is AWS EMR with HAdoop 2.7.2 and Spark 1.6.1 installed .
As per the below post its shows that issue with EMR Hadoop2.7.2
http://stackoverflow.com/questions/30385981/how-to-access-s3a-files-from-apache-spark
Is it really the issue ?
Could somebody help me validate the above ?


Thanks,
Divya



On 1 September 2016 at 16:59, Steve Loughran <stevel@hortonworks.com> wrote:

>
> On 1 Sep 2016, at 03:45, Divya Gehlot <divya.htconex@gmail.com> wrote:
>
> Hi,
> I am using Spark 1.6.1 in EMR machine
> I am trying to read s3 buckets in my Spark job .
> When I read it through Spark shell I am able to read it ,but when I try to
> package the job and and run it as spark submit I am getting below error
>
> 16/08/31 07:36:38 INFO ApplicationMaster: Registered signal handlers for
> [TERM, HUP, INT]
>
>> 16/08/31 07:36:39 INFO ApplicationMaster: ApplicationAttemptId:
>> appattempt_1468570153734_2851_000001
>> Exception in thread "main" java.util.ServiceConfigurationError:
>> org.apache.hadoop.fs.FileSystem: Provider org.apache.hadoop.fs.s3a.S3AFileSystem
>> could not be instantiated
>> at java.util.ServiceLoader.fail(ServiceLoader.java:224)
>> at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
>> at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)
>>
> I have already included
>
>  "com.amazonaws" % "aws-java-sdk-s3" % "1.11.15",
>
> in my build.sbt
>
>
> Assuming you are using a released version of Hadoop 2.6 or 2.7 underneath
> spark, you will need to make sure your classpath has aws-java-sdk 1.7.4 on
> your CP. You can't just drop in a new JAR as it is incompatible at the API
> level ( https://issues.apache.org/jira/browse/HADOOP-12269 )
>
>
>     <dependency>
>       <groupId>com.amazonaws</groupId>
>       <artifactId>aws-java-sdk</artifactId>
>       <version>1.7.4</version>
>       <scope>compile</scope>
>     </dependency>
>
>
> and jackson artifacts databind and annotations in sync with the rest of
> your app
>
>
>     <dependency>
>       <groupId>com.fasterxml.jackson.core</groupId>
>       <artifactId>jackson-databind</artifactId>
>     </dependency>
>     <dependency>
>       <groupId>com.fasterxml.jackson.core</groupId>
>       <artifactId>jackson-annotations</artifactId>
>     </dependency>
>
>
> I tried the provinding the access key also in my job still the same error
> persists.
>
> when I googled it I if you have IAM role created there is no need to
> provide access key .
>
>
>
> You don't get IAM support until Hadoop 2.8 ships. sorry. Needed a fair
> amount of reworking of how S3A does authentication.
>
> Note that if you launch spark jobs with the AWS environment variables set,
> these will be automatically picked up and used to set the relevant
> properties in the configuration.
>

Mime
View raw message