spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Stetzler <>
Subject Re: Can't get Spark to interface with S3A Filesystem with correct credentials
Date Wed, 04 Mar 2020 21:19:12 GMT
To successfully read from S3 using s3a, I've had to also set
in addition to `spark.hadoop.fs.s3a.access.key` and
`spark.hadoop.fs.s3a.secret.key`. I've also needed to ensure Spark has
access to the AWS SDK jar. I have downloaded `aws-java-sdk-1.7.4.jar` (maven
paired with `hadoop-aws-2.7.3.jar` in `$SPARK_HOME/jars`.

These additionally configurations don't seem related to credentials and
security (and may not even be needed in my case), but perhaps it will help


On Wed, Mar 4, 2020 at 1:11 PM Devin Boyer <>

> Hello,
> I'm attempting to run Spark within a Docker container with the hope of
> eventually running Spark on Kubernetes. Nearly all the data we currently
> process with Spark is stored in S3, so I need to be able to interface with
> it using the S3A filesystem.
> I feel like I've gotten close to getting this working but for some reason
> cannot get my local Spark installations to correctly interface with S3 yet.
> A basic example of what I've tried:
>    - Build Kubernetes docker images by downloading the
>    spark-2.4.5-bin-hadoop2.7.tgz archive and building the
>    kubernetes/dockerfiles/spark/Dockerfile image.
>    - Run an interactive docker container using the above built image.
>    - Within that container, run spark-shell. This command passes valid
>    AWS credentials by setting spark.hadoop.fs.s3a.access.key and
>    spark.hadoop.fs.s3a.secret.key using --conf flags, and downloads the
>    hadoop-aws package by specifying the --packages
>    org.apache.hadoop:hadoop-aws:2.7.3 flag.
>    - Try to access the simple public file as outlined in the "Integration
>    with Cloud Infrastructures
>    <>"
>    documentation by running:
>    sc.textFile("s3a://landsat-pds/scene_list.gz").take(5)
>    - Observe this to fail with a 403 Forbidden exception thrown by S3
> I've tried a variety of other means of setting credentials (like exporting
> the standard AWS_ACCESS_KEY_ID environment variable before launching
> spark-shell), and other means of building a Spark image and including the
> appropriate libraries (see this Github repo:
>, all with the same results.
> I've tried also accessing objects within our AWS account, rather than the
> object from the public landsat-pds bucket, with the same 403 error being
> thrown.
> Can anyone help explain why I can't seem to connect to S3 successfully
> using Spark, or even explain where I could look for additional clues as to
> what's misconfigured? I've tried turning up the logging verbosity and
> didn't see much that was particularly useful, but happy to share additional
> log output too.
> Thanks for any help you can provide!
> Best,
> Devin Boyer

View raw message