spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nasrulla Khan Haris <Nasrulla.K...@microsoft.com.INVALID>
Subject RE: RDD object Out of scope.
Date Wed, 22 May 2019 00:39:27 GMT
I am trying to find the code that cleans up uncached RDD.

Thanks,
Nasrulla

From: Charoes <charoes@gmail.com>
Sent: Tuesday, May 21, 2019 5:10 PM
To: Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com.invalid>
Cc: Wenchen Fan <cloud0fan@gmail.com>; dev@spark.apache.org
Subject: Re: RDD object Out of scope.

If you cached a RDD and hold a reference of that RDD in your code, then your RDD will NOT
be cleaned up.
There is a ReferenceQueue in ContextCleaner, which is used to keep tracking the reference
of RDD, Broadcast, and Accumulator etc.

On Wed, May 22, 2019 at 1:07 AM Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com.invalid<mailto:Nasrulla.Khan@microsoft.com.invalid>>
wrote:
Thanks for reply Wenchen, I am curious as what happens when RDD goes out of scope when it
is not cached.

Nasrulla

From: Wenchen Fan <cloud0fan@gmail.com<mailto:cloud0fan@gmail.com>>
Sent: Tuesday, May 21, 2019 6:28 AM
To: Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com.invalid<mailto:Nasrulla.Khan@microsoft.com.invalid>>
Cc: dev@spark.apache.org<mailto:dev@spark.apache.org>
Subject: Re: RDD object Out of scope.

RDD is kind of a pointer to the actual data. Unless it's cached, we don't need to clean up
the RDD.

On Tue, May 21, 2019 at 1:48 PM Nasrulla Khan Haris <Nasrulla.Khan@microsoft.com.invalid<mailto:Nasrulla.Khan@microsoft.com.invalid>>
wrote:
HI Spark developers,

Can someone point out the code where RDD objects go out of scope ?. I found the contextcleaner<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fspark%2Fblob%2Fmaster%2Fcore%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2FContextCleaner.scala%23L178&data=02%7C01%7CNasrulla.Khan%40microsoft.com%7Cd3db7eb5d2464e56f8cf08d6de49ddb6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636940806173476082&sdata=n%2FhFVJIRNVEgH%2FPM3oXfJ47VdhBtprAUGJh8tUPb3i8%3D&reserved=0>
code in which only persisted RDDs are cleaned up in regular intervals if the RDD is registered
to cleanup. I have not found where the destructor for RDD object is invoked. I am trying to
understand when RDD cleanup happens when the RDD is not persisted.

Thanks in advance, appreciate your help.
Nasrulla

Mime
View raw message