spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joseph Bradley <jos...@databricks.com>
Subject Re: Make ML Developer APIs public (post-1.4)
Date Thu, 06 Aug 2015 20:37:52 GMT
Eron,

Thanks for sending out this list!  We can make some of the critical ones
public for 1.5, but they will be marked DeveloperApi since they may require
changes in the future.  Just made the JIRA: [
https://issues.apache.org/jira/browse/SPARK-9704] and I'll send a PR soon.

Joseph

On Mon, Aug 3, 2015 at 4:51 PM, Eron Wright <ewright@live.com> wrote:

>
> Hello,
>
> In developing new *third-party* *pipeline components* for Spark ML 1.4
> (see dl4j-spark-ml), I encountered a few gaps in the earlier effort to make
> the ML Developer APIs public (SPARK-5995).    I plan to file issues after
> we discuss on this thread.   The below is a list of types that are
> presently private but might best be made public.
>
>    1. *VectorUDT*.    To define a relation with a vector field,
>     VectorUDT must be instantiated.
>    2. *SchemaUtils*.   Third-party pipeline components have a need for
>    checking column types and appending columns.
>    3. *Identifiable trait*.   The trait generates a unique identifier for
>    the associated pipeline component.  Nice to have a consistent format by
>    reusing the trait.
>    4. *ProbabilisticClassifier*.  Third-party components should leverage
>    the complex logic around computing only selected columns.
>    5. *Shared Params* (HasLabel, HasFeatures).   This is covered in
>    SPARK-7146 but reiterating it here.
>
> Thanks,
> Eron Wright
>

Mime
View raw message