spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bowden, Chris" <chris.bow...@microfocus.com>
Subject Re: Structured Streaming Spark 2.3 Query
Date Fri, 23 Mar 2018 06:10:06 GMT
Use a streaming query listener that tracks repetitive progress events for the same batch id.
If x amount of time has elapsed given repetitive progress events for the same batch id, the
source is not providing new offsets and stream execution is not scheduling new micro batches.
See also: spark.sql.streaming.pollingDelay. Alternative methods may produce less than desirable
results due to specific characteristics of a source / sink / workflow. It may be more desirable
to represent the amount of time as the number of repetitive progress events to be more forgiving
of implementation details (e.g., kafka source has internal retry attempts to determine latest
offsets and sleeps in between attempts if there is a miss when asked for new data, etc.).


-Chris

________________________________
From: Aakash Basu <aakash.spark.raj@gmail.com>
Sent: Thursday, March 22, 2018 10:45:38 PM
To: user
Subject: Structured Streaming Spark 2.3 Query

Hi,

What is the way to stop a Spark Streaming job if there is no data inflow for an arbitrary
amount of time (eg: 2 mins)?

Thanks,
Aakash.

Mime
View raw message