spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Charoes <char...@gmail.com>
Subject Re: RDD object Out of scope.
Date Wed, 22 May 2019 00:09:50 GMT
If you cached a RDD and hold a reference of that RDD in your code, then
your RDD will NOT be cleaned up.
There is a ReferenceQueue in ContextCleaner, which is used to keep tracking
the reference of RDD, Broadcast, and Accumulator etc.

On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris
<Nasrulla.Khan@microsoft.com.invalid> wrote:

> Thanks for reply Wenchen, I am curious as what happens when RDD goes out
> of scope when it is not cached.
>
>
>
> Nasrulla
>
>
>
> *From:* Wenchen Fan <cloud0fan@gmail.com>
> *Sent:* Tuesday, May 21, 2019 6:28 AM
> *To:* Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com.invalid>
> *Cc:* dev@spark.apache.org
> *Subject:* Re: RDD object Out of scope.
>
>
>
> RDD is kind of a pointer to the actual data. Unless it's cached, we don't
> need to clean up the RDD.
>
>
>
> On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <
> Nasrulla.Khan@microsoft.com.invalid> wrote:
>
> HI Spark developers,
>
>
>
> Can someone point out the code where RDD objects go out of scope ?. I
> found the contextcleaner
> <https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fmaster%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2FContextCleaner.scala%23L178&data=02%7C01%7CNasrulla.Khan%40microsoft.com%7C81b54c9707834f297cc408d6ddf03381%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636940421061281654&sdata=ifd7sXnbwxIuzPXW2hIrhI%2BZN9kLccglY7W%2B%2BDJmbZI%3D&reserved=0>
> code in which only persisted RDDs are cleaned up in regular intervals if
> the RDD is registered to cleanup. I have not found where the destructor for
> RDD object is invoked. I am trying to understand when RDD cleanup happens
> when the RDD is not persisted.
>
>
>
> Thanks in advance, appreciate your help.
>
> Nasrulla
>
>
>
>

Mime
View raw message