spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean R. Owen (Jira)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-28222) Feature importance outputs different values in GBT and Random Forest in 2.3.3 and 2.4 pyspark version
Date Sat, 26 Oct 2019 23:12:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-28222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sean R. Owen resolved SPARK-28222.
----------------------------------
    Resolution: Duplicate

> Feature importance outputs different values in GBT and Random Forest in 2.3.3 and 2.4
pyspark version
> -----------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-28222
>                 URL: https://issues.apache.org/jira/browse/SPARK-28222
>             Project: Spark
>          Issue Type: Bug
>          Components: ML
>    Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3
>            Reporter: eneriwrt
>            Priority: Minor
>
> Feature importance values obtained in a binary classification project outputs different
values if 2.3.3 version used or 2.4.0. It happens in Random Forest and GBT. Turns out that
values that are equal than sklearn output are from 2.3.3 version. 
> As an example:
> *SPARK 2.4*
>  MODEL RandomForestClassifier_gini [0.0, 0.4117930839002269, 0.06894132653061226, 0.15857667209786705,
0.2974447311021076, 0.06324418636918638]
>  MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 0.06578883597468652,
0.17433924485055197, 0.31754597164210124, 0.055888697733790925]
>  MODEL GradientBoostingClassifier [0.0, 0.7555555555555556, 0.24444444444444438, 0.0,
1.4602196686471875e-17, 0.0]
> *SPARK 2.3.3*
>  MODEL RandomForestClassifier_gini [0.0, 0.40957086167800455, 0.06894132653061226, 0.16413222765342259,
0.2974447311021076, 0.05991085303585305]
>  MODEL RandomForestClassifier_entropy [0.0, 0.3864372497988694, 0.06578883597468652,
0.18789704501922055, 0.30398817147343266, 0.055888697733790925]
>  MODEL GradientBoostingClassifier [0.0, 0.7555555555555555, 0.24444444444444438, 0.0,
2.4326753518951276e-17, 0.0]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message