spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject What happens if I can't fit data into memory while doing stream-stream join.
Date Fri, 23 Feb 2018 10:45:46 GMT
Hi All,

I am experimenting with Spark 2.3.0 stream-stream join feature to see if I
can leverage it to replace some of our existing services.

Imagine I have 3 worker nodes with *each node* having (16GB RAM and 100GB
SSD). My input dataset which is in Kafka is about 250GB per day. Now I want
to do a stream-stream join across 8 data frames with a watermark set to 24
hours of Tumbling window. so, I need to hold state for 24 hours and then I
can clear all the data.

Questions:

1) What happens if I can't fit data into memory while doing stream-stream
join?
2) What Storage Level should I choose here for near optimal performance?
3) Any other suggestions?

Thanks!

Mime
View raw message