spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kant kodali <kanth...@gmail.com>
Subject Re: Why do we ever run out of memory in Spark Structured Streaming?
Date Wed, 05 Apr 2017 07:39:54 GMT
Hi!

I am talking about "stateful operations like aggregations". Does this
happen on heap or off heap by default? I came across a article where I saw
both on and off heap are possible but I am not sure what happens by default
and when Spark or Spark Structured Streaming decides to store off heap?

I don't even know what mapGroupsWithState does since It's not part of spark
2.1 which is what we currently use. Any pointers would be great.

Thanks!





On Tue, Apr 4, 2017 at 5:34 PM, Tathagata Das <tathagata.das1565@gmail.com>
wrote:

> Are you referring to the memory usage of stateful operations like
> aggregations, or the new mapGroupsWithState?
> The current implementation of the internal state store (that maintains the
> stateful aggregates) is such that it keeps all the data in memory of the
> executor. It does use HDFS-compatible file system for checkpointing, but as
> of now, it currently keeps all the data in memory of the executor. This is
> something we will improve in the future.
>
> That said, you can enabled watermarking in your query that would
> automatically clear old, unnecessary state thus limiting the total memory
> used for stateful operations.
> Furthermore, you can also monitor the size of the state and get alerts if
> the state is growing too large.
>
> Read more in the programming guide.
> Watermarking - http://spark.apache.org/docs/latest/structured-
> streaming-programming-guide.html#handling-late-data-and-watermarking
> Monitoring - http://spark.apache.org/docs/latest/structured-
> streaming-programming-guide.html#monitoring-streaming-queries
>
> In case you were referring to something else, please give us more context
> details - what query, what are the symptoms you are observing.
>
> On Tue, Apr 4, 2017 at 5:17 PM, kant kodali <kanth909@gmail.com> wrote:
>
>> Why do we ever run out of memory in Spark Structured Streaming especially
>> when Memory can always spill to disk ? until the disk is full we shouldn't
>> be out of memory.isn't it? sure thrashing will happen more frequently and
>> degrades performance but we do we ever run out Memory even in case of
>> maintaining a state for 6 months or a year?
>>
>> Thanks!
>>
>
>

Mime
View raw message