I see a lot of messages such as this in the Driver log even though this is not the first batch. Job has been running for more than 3 days....


Jan 21, 2021 @ 17:09:42.484    21/01/21 11:39:34 WARN state.HDFSBackedStateStoreProvider: The state for version 43405 doesn't exist in loadedMaps. Reading snapshot file and delta files if needed...Note that this is normal for the first batch of starting query.
Jan 21, 2021 @ 17:09:16.688 21/01/21 11:39:07 WARN state.HDFSBackedStateStoreProvider: The state for version 43405 doesn't exist in loadedMaps. Reading snapshot file and delta files if needed...Note that this is normal for the first batch of starting query.
Jan 21, 2021 @ 16:09:43.831 21/01/21 10:39:39 WARN state.HDFSBackedStateStoreProvider: The state for version 43404 doesn't exist in loadedMaps. Reading snapshot file and delta files if needed...Note that this is normal for the first batch of starting query.
Jan 21, 2021 @ 16:09:41.493 21/01/21 10:39:32 WARN state.HDFSBackedStateStoreProvider: The state for version 43404 doesn't exist in loadedMaps. Reading snapshot file and delta files if needed...Note that this is normal for the first batch of starting query.
Jan 21, 2021 @ 16:09:41.160 21/01/21 10:39:39 WARN state.HDFSBackedStateStoreProvider: The state for version 43404 doesn't exist in loadedMaps. Reading snapshot file and delta files if needed...Note that this is normal for the first batch of starting query.
Jan 21, 2021 @ 16:09:20.265 21/01/21 10:39:19 WARN state.HDFSBackedStateStoreProvider: The state for version 43404 doesn't exist in loadedMaps. Reading snapshot file and delta files if needed...Note that this is normal for the first batch of starting query.
Jan 21, 2021 @ 16:09:18.896 21/01/21 10:39:11 WARN state.HDFSBackedStateStoreProvider: The state for version 43404 doesn't exist in loadedMaps. Reading snapshot file and delta files if needed...Note that this is normal for the first batch of starting query.
Jan 21, 2021 @ 15:48:01.850 21/01/21 10:17:53 WARN common.QueryListener: InputRows: 40543212

Also I see this warning...

21/01/21 12:10:57 WARN internals.AbstractCoordinator: [Consumer clientId=consumer-1, groupId=spark-kafka-source-75862e5f-2261-4216-b856-462d24dc6e47-558756072-driver-0] This member will leave the group because consumer poll timeout has expired. This means the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time processing messages. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records.




On Thu, Jan 21, 2021 at 5:18 PM Jungtaek Lim <kabhwan.opensource@gmail.com> wrote:
I'm not sure how many people could even guess possible reasons - I'd say there's not enough information. No driver/executor logs, no job/stage/executor information, no code.

On Thu, Jan 21, 2021 at 8:25 PM Jacek Laskowski <jacek@japila.pl> wrote:
Hi,

I'd look at stages and jobs as it's possible that the only task running is the missing one in a stage of a job. Just guessing...

On Thu, Jan 21, 2021 at 12:19 PM Eric Beabes <mailinglists19@gmail.com> wrote:
Hello,

My Spark Structured Streaming application was performing well for quite some time but all of a sudden from today it has slowed down. I noticed in the Spark UI that the 'No. of Active Tasks' is 1 even though 64 Cores are available. (Please see the attached image).

I don't believe there's any data skew issue related to partitioning of data. What could be the reason for this? Please advise. Thanks.



---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org