spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: How to put an object in cache for ever in Streaming
Date Mon, 19 Oct 2015 21:16:18 GMT
That should also get cleaned through the GC, though you may have to
explicitly run GC periodically for faster clean up.

RDDs are by definition distributed across executors in parts. When caches
the RDD partitions are cached in memory across the executors.

On Fri, Oct 16, 2015 at 6:15 PM, swetha kasireddy <swethakasireddy@gmail.com
> wrote:

> What about cleaning up the tempData that gets generated by shuffles. We
> have a lot of temp data that gets generated by shuffles in /tmp folder.
> That's why we are using ttl. Also if I keep an RDD in cache is it available
> across all the executors or just the same executor?
>
> On Fri, Oct 16, 2015 at 5:49 PM, Tathagata Das <tdas@databricks.com>
> wrote:
>
>> Setting a ttl is not recommended any more as Spark works with Java GC to
>> clean up stuff (RDDs, shuffles, broadcasts,etc.) that are not in reference
>> any more.
>>
>> So you can keep an RDD cached in Spark, and every minute uncache the
>> previous one, and cache a new one.
>>
>> TD
>>
>> On Fri, Oct 16, 2015 at 12:02 PM, swetha <swethakasireddy@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> How to put a changing object in Cache for ever in Streaming. I know that
>>> we
>>> can do rdd.cache but I think .cache would be cleaned up if we set ttl in
>>> Streaming. Our requirement is to have an object in memory. The object
>>> would
>>> be updated every minute  based on the records that we get in our
>>> Streaming
>>> job.
>>>
>>>  Currently I am keeping that in updateStateByKey. But, my
>>> updateStateByKey
>>> is tracking the realtime Session information as well. So, my
>>> updateStateByKey has 4 arguments that tracks session information and
>>> this
>>> object  that tracks the performance info separately. I was thinking it
>>> may
>>> be too much to keep so much data in updateStateByKey.
>>>
>>> Is it recommended to hold a lot of data using updateStateByKey?
>>>
>>>
>>> Thanks,
>>> Swetha
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-put-an-object-in-cache-for-ever-in-Streaming-tp25098.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>

Mime
View raw message