spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marco Gaido (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-28222) Feature importance outputs different values in GBT and Random Forest in 2.3.3 and 2.4 pyspark version
Date Tue, 02 Jul 2019 19:40:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-28222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877267#comment-16877267
] 

Marco Gaido commented on SPARK-28222:
-------------------------------------

Mmmmh, there has been a bug fix for it (see SPARK-26721), but it should be in 3.0 only AFAIK.
The question is: which is the rigth value? Can you compare it with other libs like sklearn?

> Feature importance outputs different values in GBT and Random Forest in 2.3.3 and 2.4
pyspark version
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-28222
>                 URL: https://issues.apache.org/jira/browse/SPARK-28222
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3
>            Reporter: eneriwrt
>            Priority: Minor
>
> Feature importance values obtained in a binary classification project outputs different
values if 2.3.3 version used or 2.4.0. It happens in Random Forest and GBT.
> As an example:
> *SPARK 2.4*
> MODEL RandomForestClassifier_gini [0.0, 0.4117930839002269, 0.06894132653061226, 0.15857667209786705,
0.2974447311021076, 0.06324418636918638]
> MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 0.06578883597468652, 0.17433924485055197,
0.31754597164210124, 0.055888697733790925]
> MODEL GradientBoostingClassifier [0.0, 0.7555555555555556, 0.24444444444444438, 0.0,
1.4602196686471875e-17, 0.0]
> *SPARK 2.3.3*
> MODEL RandomForestClassifier_gini [0.0, 0.40957086167800455, 0.06894132653061226, 0.16413222765342259,
0.2974447311021076, 0.05991085303585305]
> MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 0.06578883597468652, 0.18789704501922055,
0.30398817147343266, 0.055888697733790925]
> MODEL GradientBoostingClassifier [0.0, 0.7555555555555555, 0.24444444444444438, 0.0,
2.4326753518951276e-17, 0.0]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message