spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sherif98 <>
Subject Spark Structured Streaming checkpointing with S3 data source
Date Thu, 30 Aug 2018 11:32:24 GMT
I have data that is continuously pushed to multiple S3 buckets. I want to set
up a structured streaming application that uses the S3 buckets as the data
source and do stream-stream joins.

My question is if the application is down for some reason, will restarting
the application would continue processing data from the S3 where it left

So for example, if I have 5 JSON files with 100 records in each file. And
spark failed while processing the tenth record in the 3rd file. When the
query runs again will it begin processing from the tenth record in the 3rd

Sent from:

To unsubscribe e-mail:

View raw message