spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy Hunter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-19634) Feature parity for descriptive statistics in MLlib
Date Mon, 13 Mar 2017 22:52:41 GMT

    [ https://issues.apache.org/jira/browse/SPARK-19634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923119#comment-15923119
] 

Timothy Hunter commented on SPARK-19634:
----------------------------------------

I was not able to finish it in time, but the bulk of the code is in this branch:

https://github.com/apache/spark/compare/master...thunterdb:19634?expand=1

Note that it currently includes a (non-working) UDAF and an incomplete TypedImperativeAggregate.
It turns out that UDAF interface is not suited for this sort of aggregators, which I realized
quite late. I started to refactor my code to use TypedImperativeAggregate, but did not have
to finish it. If someone wants to pick up this task, he or she is welcome to do it.

> Feature parity for descriptive statistics in MLlib
> --------------------------------------------------
>
>                 Key: SPARK-19634
>                 URL: https://issues.apache.org/jira/browse/SPARK-19634
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>    Affects Versions: 2.1.0
>            Reporter: Timothy Hunter
>            Assignee: Timothy Hunter
>
> This ticket tracks porting the functionality of spark.mllib.MultivariateOnlineSummarizer
over to spark.ml.
> A design has been discussed in SPARK-19208 . Here is a design doc:
> https://docs.google.com/document/d/1ELVpGV3EBjc2KQPLN9_9_Ge9gWchPZ6SGtDW5tTm_50/edit#



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message