spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nasrulla Khan Haris <Nasrulla.K...@microsoft.com.INVALID>
Subject RE: RDD object Out of scope.
Date Wed, 22 May 2019 02:13:44 GMT
Thanks Sean, that makes sense. 

Regards,
Nasrulla

-----Original Message-----
From: Sean Owen <srowen@gmail.com> 
Sent: Tuesday, May 21, 2019 6:24 PM
To: Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com>
Cc: dev@spark.apache.org
Subject: Re: RDD object Out of scope.

I'm not clear what you're asking. An RDD itself is just an object in the JVM. It will be garbage
collected if there are no references. What else would there be to clean up in your case? ContextCleaner
handles cleaned up of persisted RDDs, etc.

On Tue, May 21, 2019 at 7:39 PM Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com.invalid>
wrote:
>
> I am trying to find the code that cleans up uncached RDD.
>
>
>
> Thanks,
>
> Nasrulla
>
>
>
> From: Charoes <charoes@gmail.com>
> Sent: Tuesday, May 21, 2019 5:10 PM
> To: Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com.invalid>
> Cc: Wenchen Fan <cloud0fan@gmail.com>; dev@spark.apache.org
> Subject: Re: RDD object Out of scope.
>
>
>
> If you cached a RDD and hold a reference of that RDD in your code, then your RDD will
NOT be cleaned up.
>
> There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the reference
of RDD, Broadcast, and Accumulator etc.
>
>
>
> On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com.invalid>
wrote:
>
> Thanks for reply Wenchen, I am curious as what happens when RDD goes out of scope when
it is not cached.
>
>
>
> Nasrulla
>
>
>
> From: Wenchen Fan <cloud0fan@gmail.com>
> Sent: Tuesday, May 21, 2019 6:28 AM
> To: Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com.invalid>
> Cc: dev@spark.apache.org
> Subject: Re: RDD object Out of scope.
>
>
>
> RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean
up the RDD.
>
>
>
> On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com.invalid>
wrote:
>
> HI Spark developers,
>
>
>
> Can someone point out the code where RDD objects go out of scope ?. I found the contextcleaner
code in which only persisted RDDs are cleaned up in regular intervals if the RDD is registered
to cleanup. I have not found where the destructor for RDD object is invoked. I am trying to
understand when RDD cleanup happens when the RDD is not persisted.
>
>
>
> Thanks in advance, appreciate your help.
>
> Nasrulla
>
>
Mime
View raw message