spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gerard Maas <>
Subject [Structured Streaming] OOM on ConsoleSink with large inputs
Date Fri, 11 Aug 2017 22:00:35 GMT

While investigating another issue, I came across this OOM error when using
the Console Sink with any source that can be larger than the available
driver memory. In my case, I was using the File source and I had a 14G file
in the monitored dir.

I traced back the issue to a `df.collect` in the Console Sink code.
I created a JIRA for it:
and a PR is available:

I hope a committer can check it out.

-kr, Gerard.

View raw message