spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Beabes <mailinglist...@gmail.com>
Subject Re: Only one Active task in Spark Structured Streaming application
Date Thu, 21 Jan 2021 12:22:06 GMT
I see a lot of messages such as this in the Driver log even though this is
not the first batch. Job has been running for more than 3 days....


Jan 21, 2021 @ 17:09:42.484    21/01/21 11:39:34 WARN
state.HDFSBackedStateStoreProvider: The state for version 43405
doesn't exist in loadedMaps. Reading snapshot file and delta files if
needed...Note that this is normal for the first batch of starting
query.
Jan 21, 2021 @ 17:09:16.688    21/01/21 11:39:07 WARN
state.HDFSBackedStateStoreProvider: The state for version 43405
doesn't exist in loadedMaps. Reading snapshot file and delta files if
needed...Note that this is normal for the first batch of starting
query.

Jan 21, 2021 @ 16:09:43.831 21/01/21 10:39:39 WARN
state.HDFSBackedStateStoreProvider: The state for version 43404
doesn't exist in loadedMaps. Reading snapshot file and delta files if
needed...Note that this is normal for the first batch of starting
query.
Jan 21, 2021 @ 16:09:41.493    21/01/21 10:39:32 WARN
state.HDFSBackedStateStoreProvider: The state for version 43404
doesn't exist in loadedMaps. Reading snapshot file and delta files if
needed...Note that this is normal for the first batch of starting
query.
Jan 21, 2021 @ 16:09:41.160    21/01/21 10:39:39 WARN
state.HDFSBackedStateStoreProvider: The state for version 43404
doesn't exist in loadedMaps. Reading snapshot file and delta files if
needed...Note that this is normal for the first batch of starting
query.
Jan 21, 2021 @ 16:09:20.265    21/01/21 10:39:19 WARN
state.HDFSBackedStateStoreProvider: The state for version 43404
doesn't exist in loadedMaps. Reading snapshot file and delta files if
needed...Note that this is normal for the first batch of starting
query.
Jan 21, 2021 @ 16:09:18.896    21/01/21 10:39:11 WARN
state.HDFSBackedStateStoreProvider: The state for version 43404
doesn't exist in loadedMaps. Reading snapshot file and delta files if
needed...Note that this is normal for the first batch of starting
query.
Jan 21, 2021 @ 15:48:01.850    21/01/21 10:17:53 WARN
common.QueryListener: InputRows: 40543212


Also I see this warning...


21/01/21 12:10:57 WARN internals.AbstractCoordinator: [Consumer
clientId=consumer-1,
groupId=spark-kafka-source-75862e5f-2261-4216-b856-462d24dc6e47-558756072-driver-0]
This member will leave the group because consumer poll timeout has
expired. This means the time between subsequent calls to poll() was
longer than the configured max.poll.interval.ms, which typically
implies that the poll loop is spending too much time processing
messages. You can address this either by increasing
max.poll.interval.ms or by reducing the maximum size of batches
returned in poll() with max.poll.records.





On Thu, Jan 21, 2021 at 5:18 PM Jungtaek Lim <kabhwan.opensource@gmail.com>
wrote:

> I'm not sure how many people could even guess possible reasons - I'd say
> there's not enough information. No driver/executor logs, no
> job/stage/executor information, no code.
>
> On Thu, Jan 21, 2021 at 8:25 PM Jacek Laskowski <jacek@japila.pl> wrote:
>
>> Hi,
>>
>> I'd look at stages and jobs as it's possible that the only task running
>> is the missing one in a stage of a job. Just guessing...
>>
>> Pozdrawiam,
>> Jacek Laskowski
>> ----
>> https://about.me/JacekLaskowski
>> "The Internals Of" Online Books <https://books.japila.pl/>
>> Follow me on https://twitter.com/jaceklaskowski
>>
>> <https://twitter.com/jaceklaskowski>
>>
>>
>> On Thu, Jan 21, 2021 at 12:19 PM Eric Beabes <mailinglists19@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> My Spark Structured Streaming application was performing well for quite
>>> some time but all of a sudden from today it has slowed down. I noticed in
>>> the Spark UI that the 'No. of Active Tasks' is 1 even though 64 Cores are
>>> available. (Please see the attached image).
>>>
>>> I don't believe there's any data skew issue related to partitioning of
>>> data. What could be the reason for this? Please advise. Thanks.
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>

Mime
View raw message