spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eron Wright <>
Subject Make ML Developer APIs public (post-1.4)
Date Mon, 03 Aug 2015 23:51:04 GMT


In developing new third-party pipeline components for Spark ML 1.4 (see dl4j-spark-ml), I
encountered a few gaps in the earlier effort to make the ML Developer APIs public (SPARK-5995).
   I plan to file issues after we discuss on this thread.   The below is a list of types that
are presently private but might best be made public.
VectorUDT.    To define a relation with a vector field,  VectorUDT must be instantiated.
SchemaUtils.   Third-party pipeline components have a need for checking column types and appending
Identifiable trait.   The trait generates a unique identifier for the associated pipeline
component.  Nice to have a consistent format by reusing the trait.
ProbabilisticClassifier.  Third-party components should leverage the complex logic around
computing only selected columns.
Shared Params (HasLabel, HasFeatures).   This is covered in SPARK-7146 but reiterating it
Eron Wright

View raw message