spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aniruddha P Tekade <>
Subject Spark job stuck at s3a-file-system metrics system started
Date Wed, 29 Apr 2020 23:53:49 GMT

I am trying to run a spark job that is trying to write the data into a
custom s3 endpoint bucket. But I am stuck at this line of output and job is
not moving forward at all -

20/04/29 16:03:59 INFO SharedState: Setting
hive.metastore.warehouse.dir ('null') to the value of
20/04/29 16:03:59 INFO SharedState: Warehouse path is
20/04/29 16:04:01 WARN MetricsConfig: Cannot locate configuration:
20/04/29 16:04:02 INFO MetricsSystemImpl: Scheduled Metric snapshot
period at 10 second(s).
20/04/29 16:04:02 INFO MetricsSystemImpl: s3a-file-system metrics system started

After long time of waiting it shows this -

org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on
test-bucket: com.amazonaws.SdkClientException: Unable to execute HTTP
request: Connect to
[] failed: Connection refused
(Connection refused): Unable to execute HTTP request: Connect to [] failed:
Connection refused (Connection refused)

However, I am able to access this bucket from aws cli from the same
machine. I don't understand why it is saying not able to execute the HTTP

I am using -

spark               3.0.0-preview2
hadoop-aws          3.2.0
aws-java-sdk-bundle 1.11.375

My spark code has following properties set for hadoop configuration -

spark.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", ENDPOINT);
spark.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", ACCESS_KEY);
spark.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", SECRET_KEY);
spark.sparkContext.hadoopConfiguration.set("", "true")

Can someone help me in understanding what is wrong here? Is there anything
else I need to configure. The custom s3-endpoint and its keys are valid and
working from aws cli profile. What is wrong with the scala code here?

val dataStreamWriter: DataStreamWriter[Row] = as "day",
month(current_date()) as "month", year(current_date()) as "year")
      .option("checkpointLocation", "/Users/abc/Desktop/qct-checkpoint/")
      .trigger(Trigger.ProcessingTime("15 seconds"))
      .partitionBy("year", "month", "day")
      .option("path", "s3a://test-bucket")

val streamingQuery: StreamingQuery = dataStreamWriter.start()


View raw message