spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joseph K. Bradley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-6113) Stabilize DecisionTree and ensembles APIs
Date Fri, 06 Mar 2015 02:21:38 GMT

    [ https://issues.apache.org/jira/browse/SPARK-6113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349811#comment-14349811
] 

Joseph K. Bradley commented on SPARK-6113:
------------------------------------------

Pinging [~MechCoder] since you've been working on tree ensembles.  Before long, I hope to
start refactoring the tree and ensemble APIs, which will require a little coordination.  Here's
what I'm planning:
1. I'll make a PR with the new API.  It will use but not touch the existing tree & ensemble
code.
2. Merge or close existing PRs towards the old API.
3. I'll make a PR moving the code to the new API, making the old API a wrapper. (No new PRs
should be made at this time.)
4. Any new PRs will be made against the new API.

Note in the design doc that the new and old APIs will be in different namespaces:
* old: mllib.tree.*
* new: mllib.classification.* and mllib.regression.*


> Stabilize DecisionTree and ensembles APIs
> -----------------------------------------
>
>                 Key: SPARK-6113
>                 URL: https://issues.apache.org/jira/browse/SPARK-6113
>             Project: Spark
>          Issue Type: Sub-task
>          Components: MLlib, PySpark
>    Affects Versions: 1.4.0
>            Reporter: Joseph K. Bradley
>            Assignee: Joseph K. Bradley
>            Priority: Critical
>
> *Issue*: The APIs for DecisionTree and ensembles (RandomForests and GradientBoostedTrees)
have been experimental for a long time.  The API has become very convoluted because trees
and ensembles have many, many variants, some of which we have added incrementally without
a long-term design.
> *Proposal*: This JIRA is for discussing changes required to finalize the APIs.  After
we discuss, I will make a PR to update the APIs and make them non-Experimental.  This will
require making many breaking changes; see the design doc for details.
> [Design doc | https://docs.google.com/document/d/1rJ_DZinyDG3PkYkAKSsQlY0QgCeefn4hUv7GsPkzBP4]:
This outlines current issues and the proposed API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message