spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: [Error:]while read s3 buckets in Spark 1.6 in spark -submit
Date Thu, 01 Sep 2016 08:59:00 GMT

On 1 Sep 2016, at 03:45, Divya Gehlot <divya.htconex@gmail.com<mailto:divya.htconex@gmail.com>>
wrote:

Hi,
I am using Spark 1.6.1 in EMR machine
I am trying to read s3 buckets in my Spark job .
When I read it through Spark shell I am able to read it ,but when I try to package the job
and and run it as spark submit I am getting below error

16/08/31 07:36:38 INFO ApplicationMaster: Registered signal handlers for [TERM, HUP, INT]

16/08/31 07:36:39 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1468570153734_2851_000001
Exception in thread "main" java.util.ServiceConfigurationError: org.apache.hadoop.fs.FileSystem:
Provider org.apache.hadoop.fs.s3a.S3AFileSystem could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:224)
at java.util.ServiceLoader.access$100(ServiceLoader.java:181)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:377)

I have already included

 "com.amazonaws" % "aws-java-sdk-s3" % "1.11.15",

in my build.sbt


Assuming you are using a released version of Hadoop 2.6 or 2.7 underneath spark, you will
need to make sure your classpath has aws-java-sdk 1.7.4 on your CP. You can't just drop in
a new JAR as it is incompatible at the API level ( https://issues.apache.org/jira/browse/HADOOP-12269
)


    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk</artifactId>
      <version>1.7.4</version>
      <scope>compile</scope>
    </dependency>


and jackson artifacts databind and annotations in sync with the rest of your app


    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-databind</artifactId>
    </dependency>
    <dependency>
      <groupId>com.fasterxml.jackson.core</groupId>
      <artifactId>jackson-annotations</artifactId>
    </dependency>



I tried the provinding the access key also in my job still the same error persists.

when I googled it I if you have IAM role created there is no need to provide access key .



You don't get IAM support until Hadoop 2.8 ships. sorry. Needed a fair amount of reworking
of how S3A does authentication.

Note that if you launch spark jobs with the AWS environment variables set, these will be automatically
picked up and used to set the relevant properties in the configuration.

Mime
View raw message