spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nasrulla Khan Haris <>
Subject RE: RDD object Out of scope.
Date Wed, 22 May 2019 00:39:27 GMT
I am trying to find the code that cleans up uncached RDD.


From: Charoes <>
Sent: Tuesday, May 21, 2019 5:10 PM
To: Nasrulla Khan Haris <>
Cc: Wenchen Fan <>;
Subject: Re: RDD object Out of scope.

If you cached a RDD and hold a reference of that RDD in your code, then your RDD will NOT
be cleaned up.
There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the reference
of RDD, Broadcast, and Accumulator etc.

On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris <<>>
Thanks for reply Wenchen, I am curious as what happens when RDD goes out of scope when it
is not cached.


From: Wenchen Fan <<>>
Sent: Tuesday, May 21, 2019 6:28 AM
To: Nasrulla Khan Haris <<>>
Subject: Re: RDD object Out of scope.

RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean up
the RDD.

On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <<>>
HI Spark developers,

Can someone point out the code where RDD objects go out of scope ?. I found the contextcleaner<>
code in which only persisted RDDs are cleaned up in regular intervals if the RDD is registered
to cleanup. I have not found where the destructor for RDD object is invoked. I am trying to
understand when RDD cleanup happens when the RDD is not persisted.

Thanks in advance, appreciate your help.

View raw message