spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandish Kumar HN <sanysand...@gmail.com>
Subject Using StreamingQueryListener.OnTerminate for Kafka Offset restore
Date Fri, 09 Aug 2019 20:13:25 GMT
Hey Everyone,

I'm using Spark StreamingQueryListener in Structured Streaming App

Whenever I see an OffsetOutOfRangeException's in Spark Job Inside
StreamingQueryListener.onTerminated method I'm updating the Spark
checkpoint directory offsets.

I was able to parse all OffsetOutOfRangeException's occurred on Job and
catch them and parse the partition, offset and connect to Kafka using Kafka
API and get right offsets and update the spark checkpointlocation/offsets
folder.

Everything works fine even if there are multiple partitions with
OffsetOutOfRangeException,

I was able to recover from OffsetOutOfRangeException's from the next run.

Does that make sense?

My question Is:
is there any locking for checkpointlocation/offsets folder? what if
multiple executors try to update checkpointlocation/offsets folder?

I also see StreamingQueryListener asynchronous API? it is across Executors
or Just with in the executor?

-- 

Thanks,
Regards,
SandishKumar HN

Mime
View raw message