spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <>
Subject [jira] [Commented] (SPARK-1357) [MLLIB] Annotate developer and experimental API's
Date Wed, 09 Apr 2014 10:21:16 GMT


Sean Owen commented on SPARK-1357:

I know I'm late to this party, but I just had a look and wanted to throw out a few last minute

(Do you not want to just declare all of MLlib experimental? is it really 1.0? that's a fairly
significant set of shackles to put on for a long time.)

OK, that aside, I have two suggestions to mark as experimental:

1. ALS Rating object assumes users and items are Int. I suggest it will be eventually interesting
to support String, or at least switch to Long.

2. Per old MLLIB-29, I feel pretty certain that ClassificationModel can't return RDD[Double],
and will want to support returning a distribution over labels at some point. Similarly the
input to it and RegressionModel seems like it will have to change to encompass something more
than Vector to properly allow for categorical values. DecisionTreeModel has the same issue
but is experimental (and doesn't integrate with these APIs?)

The point is not so much whether one agrees with these, but whether there is a non-trivial
chance of wanting to change something this year.

Other parts that I'm interested in personally look pretty strong. Humbly submitted.

> [MLLIB] Annotate developer and experimental API's
> -------------------------------------------------
>                 Key: SPARK-1357
>                 URL:
>             Project: Spark
>          Issue Type: Sub-task
>          Components: MLlib
>    Affects Versions: 1.0.0
>            Reporter: Patrick Wendell
>            Assignee: Xiangrui Meng
>             Fix For: 1.0.0

This message was sent by Atlassian JIRA

View raw message