spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srinivas V <srini....@gmail.com>
Subject structured streaming with mapGroupWithState
Date Thu, 12 Mar 2020 00:08:12 GMT
Anyone using this combination for prod? I am planning to use for a use case
with 15000 events per second from few Kafka topics. Through events are big,
I would just have to take the businessIds, frequency, first and last event
timestamp and save this into mapGroupWithState. I need to keep them for a
window if say 20 mins then push them to output Kafka. Total memory of the
state will not be more than say 50MB as I have limited number of
businessIds say 1 million.
Questions,
1.Want you to share any issues you might have faced or I may face.
2. How to debug if I am unable to keep up with inflow of events and lag is
increasing constantly?

Regards
Sri

Mime
View raw message