spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Hunter <timhun...@databricks.com>
Subject Design document - MLlib's statistical package for DataFrames
Date Thu, 16 Feb 2017 21:00:04 GMT
Hello all,

I have been looking at some of the missing items for complete feature
parity between spark.ml and spark.mllib. Here is a proposal for
porting mllib.stats, the descriptive statistics package:

https://docs.google.com/document/d/1ELVpGV3EBjc2KQPLN9_9_Ge9gWchPZ6SGtDW5tTm_50/edit?usp=sharing

The umbrella ticket for this task is:
https://issues.apache.org/jira/browse/SPARK-4591

Please comment on the document. Also, if you want to work on one of
the algorithms, the design doc and the umbrella ticket have subtasks
that you can assign yourself to.

The cutoff deadline for Spark 2.2 is rapidly approaching, and it would
be great if we could claim parity for this release!

Cheers

Tim

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message