spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-3272) Calculate prediction for nodes separately from calculating information gain for splits in decision tree
Date Thu, 28 Aug 2014 19:02:08 GMT

    [ https://issues.apache.org/jira/browse/SPARK-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14114166#comment-14114166
] 

Joseph K. Bradley commented on SPARK-3272:
------------------------------------------

With respect to [SPARK-2207], I think this JIRA may or may not be necessary for implementing
[SPARK-2207], depending on how the code is set up.  For [SPARK-2207], I imagined checking
the number of instances and the information gain when the Node is constructed in the main
loop (in the train() method).  If there are too few instances or too little information gain,
then the Node will be set as a leaf.  We could potentially avoid the aggregation for those
leafs, but I would consider that a separate issue ([SPARK-3158]).

> Calculate prediction for nodes separately from calculating information gain for splits
in decision tree
> -------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-3272
>                 URL: https://issues.apache.org/jira/browse/SPARK-3272
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.0.2
>            Reporter: Qiping Li
>             Fix For: 1.1.0
>
>
> In current implementation, prediction for a node is calculated along with calculation
of information gain stats for each possible splits. The value to predict for a specific node
is determined, no matter what the splits are.
> To save computation, we can first calculate prediction first and then calculate information
gain stats for each split.
> This is also necessary if we want to support minimum instances per node parameters([SPARK-2207|https://issues.apache.org/jira/browse/SPARK-2207])
because when all splits don't satisfy minimum instances requirement , we don't use information
gain of any splits. There should be a way to get the prediction value.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message