spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniruddha P Tekade <>
Subject [DISCUSS] writing structured streaming dataframe to custom S3 buckets?
Date Tue, 29 Oct 2019 22:43:24 GMT

I have a local S3 service that is writable and readable using AWS sdk APIs.
I created the spark session and then set the hadoop configurations as
follows -

// Create Spark Session
val spark = SparkSession

// Take spark context from spark session
val sc = spark.sparkContext

// Configure spark context with S3 values
val accessKey = "00cce9eb2c589b1b1b5b"
val secretKey = "flmheKX9Gb1tTlImO6xR++9kvnUByfRKZfI7LJT8"
val endpoint = ""

spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", endpoint)
//    spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", accessKey)
//    spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", secretKey)

sc.hadoopConfiguration.set("fs.s3a.awsAccessKeyId", accessKey)
sc.hadoopConfiguration.set("fs.s3a.awsSecretAccessKey", secretKey)

And then trying to write to the s3 as follows -

val query = rawData
  .option("format", "append")
  .option("path", "s3a://bucket0/")

But nothing is actually getting written. Since I am running this from my
local machine, I have an entry for the ip-address and S3 endpoint into the
/etc/hosts file. As you can see this is a streaming dataframe and so can
not write without writeStream API. Can someone help about what am I missing
here? Is there any better way to perform this?



View raw message