spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seth Hendrickson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-12326) Move GBT implementation from spark.mllib to spark.ml
Date Mon, 14 Dec 2015 20:57:46 GMT
Seth Hendrickson created SPARK-12326:
----------------------------------------

             Summary: Move GBT implementation from spark.mllib to spark.ml
                 Key: SPARK-12326
                 URL: https://issues.apache.org/jira/browse/SPARK-12326
             Project: Spark
          Issue Type: Improvement
          Components: ML, MLlib
            Reporter: Seth Hendrickson


Several improvements can be made to gradient boosted trees, but are not possible without moving
the GBT implementation to spark.ml (e.g. rawPrediction column, feature importance). This Jira
is for moving the current GBT implementation to spark.ml, which will have roughly the following
steps:

1. Copy the implementation to spark.ml and change spark.ml classes to use that implementation.
Current tests will ensure that the implementations learn exactly the same models. 
2. Move the decision tree helper classes over to spark.ml (e.g. Impurity, InformationGainStats,
ImpurityStats, DTStatsAggregator, etc...). Since eventually all tree implementations will
reside in spark.ml, the helper classes should as well.
3. Remove the spark.mllib implementation, and make the spark.mllib APIs wrappers around the
spark.ml implementation. The spark.ml tests will again ensure that we do not change any behavior.
4. Move the unit tests to spark.ml, and change the spark.mllib unit tests to verify model
equivalence.

Steps 2, 3, and 4 should be in separate Jiras. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message