spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Kumar (JIRA)" <>
Subject [jira] [Commented] (SPARK-6113) Stabilize DecisionTree and ensembles APIs
Date Fri, 06 Mar 2015 20:33:38 GMT


Manoj Kumar commented on SPARK-6113:

Hi, Thanks for the ping!

I had a really quick skim through the design doc and I like the splitting of Classification
and Regression models (this should make many other things easier and intuitive), the usage
of run as compared to train and the moving of losses outside tree since they are not specific
to trees.

Do you want to me help in any specific way or try to fix any blockers (if any) related to
merging with stuff with the old API, I would be privileged. Right now I just have one open
PR related to ensembles.

> Stabilize DecisionTree and ensembles APIs
> -----------------------------------------
>                 Key: SPARK-6113
>                 URL:
>             Project: Spark
>          Issue Type: Sub-task
>          Components: MLlib, PySpark
>    Affects Versions: 1.4.0
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>            Priority: Critical
> *Issue*: The APIs for DecisionTree and ensembles (RandomForests and GradientBoostedTrees)
have been experimental for a long time.  The API has become very convoluted because trees
and ensembles have many, many variants, some of which we have added incrementally without
a long-term design.
> *Proposal*: This JIRA is for discussing changes required to finalize the APIs.  After
we discuss, I will make a PR to update the APIs and make them non-Experimental.  This will
require making many breaking changes; see the design doc for details.
> [Design doc |]:
This outlines current issues and the proposed API.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message