spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seth Hendrickson (JIRA)" <>
Subject [jira] [Commented] (SPARK-12326) Move GBT implementation from spark.mllib to
Date Mon, 14 Dec 2015 21:02:46 GMT


Seth Hendrickson commented on SPARK-12326:

[~josephkb] Could you review the plan above? I couldn't find any other Jira for moving GBTs
to ML and it seems like it would be good to get this done so we can move on some other improvements
that are needed as well. Thanks!

> Move GBT implementation from spark.mllib to
> ----------------------------------------------------
>                 Key: SPARK-12326
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>            Reporter: Seth Hendrickson
> Several improvements can be made to gradient boosted trees, but are not possible without
moving the GBT implementation to (e.g. rawPrediction column, feature importance).
This Jira is for moving the current GBT implementation to, which will have roughly
the following steps:
> 1. Copy the implementation to and change classes to use that implementation.
Current tests will ensure that the implementations learn exactly the same models. 
> 2. Move the decision tree helper classes over to (e.g. Impurity, InformationGainStats,
ImpurityStats, DTStatsAggregator, etc...). Since eventually all tree implementations will
reside in, the helper classes should as well.
> 3. Remove the spark.mllib implementation, and make the spark.mllib APIs wrappers around
the implementation. The tests will again ensure that we do not change any
> 4. Move the unit tests to, and change the spark.mllib unit tests to verify model
> Steps 2, 3, and 4 should be in separate Jiras. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message