spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Apache Spark (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-23841) NodeIdCache should unpersist the last cached nodeIdsForInstances
Date Mon, 02 Apr 2018 04:09:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-23841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16421919#comment-16421919
] 

Apache Spark commented on SPARK-23841:
--------------------------------------

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/20956

> NodeIdCache should unpersist the last cached nodeIdsForInstances
> ----------------------------------------------------------------
>
>                 Key: SPARK-23841
>                 URL: https://issues.apache.org/jira/browse/SPARK-23841
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: zhengruifeng
>            Priority: Minor
>
> {{{{NodeIdCache}}}} forget to unpersist the last cached intermediate dataset:
>  
> {code:java}
> scala> import org.apache.spark.ml.classification._
> import org.apache.spark.ml.classification._
> scala> val df = spark.read.format("libsvm").load("/Users/zrf/Dev/OpenSource/spark/data/mllib/sample_libsvm_data.txt")
> 2018-04-02 11:48:25 WARN  LibSVMFileFormat:66 - 'numFeatures' option not specified,
determining the number of features by going though the input. If you know the number in advance,
please specify it via 'numFeatures' option to avoid the extra scan.
> 2018-04-02 11:48:31 WARN  ObjectStore:568 - Failed to get database global_temp, returning
NoSuchObjectException
> df: org.apache.spark.sql.DataFrame = [label: double, features: vector]
> scala> val rf = new RandomForestClassifier().setCacheNodeIds(true)
> rf: org.apache.spark.ml.classification.RandomForestClassifier = rfc_aab2b672546b
> scala> val rfm = rf.fit(df)
> rfm: org.apache.spark.ml.classification.RandomForestClassificationModel = RandomForestClassificationModel
(uid=rfc_aab2b672546b) with 20 trees
> scala> sc.getPersistentRDDs
> res0: scala.collection.Map[Int,org.apache.spark.rdd.RDD[_]] = Map(56 -> MapPartitionsRDD[56]
at map at NodeIdCache.scala:102){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message