spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qiping Li (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-3272) Calculate prediction for nodes separately from calculating information gain for splits in decision tree
Date Thu, 28 Aug 2014 02:49:58 GMT
Qiping Li created SPARK-3272:
--------------------------------

             Summary: Calculate prediction for nodes separately from calculating information
gain for splits in decision tree
                 Key: SPARK-3272
                 URL: https://issues.apache.org/jira/browse/SPARK-3272
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
    Affects Versions: 1.0.2
            Reporter: Qiping Li
             Fix For: 1.1.0


In current implementation, prediction for a node is calculated along with calculation of information
gain stats for each possible splits. The value to predict for a specific node is determined,
no matter what the splits are.
To save computation, we can first calculate prediction first and then calculate information
gain stats for each split.

This is also necessary if we want to support minimum instances per node parameters([SPARK-2207|https://issues.apache.org/jira/browse/SPARK-2207])
because when all splits don't satisfy minimum instances requirement , we don't use information
gain of any splits. There should be a way to get the prediction value.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message