spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Kumar (JIRA)" <>
Subject [jira] [Commented] (SPARK-6192) Enhance MLlib's Python API (GSoC 2015)
Date Mon, 09 Mar 2015 04:48:38 GMT


Manoj Kumar commented on SPARK-6192:

[~Manglano] [~leckie-chn] Hi, I am actually not a mentor but a student whom this GSoC project
is preassigned to by Xiangrui (since I've been working on the Spark codebase for about a couple
of months right now) . This project idea was actually a result of brainstorming across different
Pull Requests. I would suggest you have a look at different issues which would help you gain
familiarity with the API and help to propose a project proposal. Hope that helps.

> Enhance MLlib's Python API (GSoC 2015)
> --------------------------------------
>                 Key: SPARK-6192
>                 URL:
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, MLlib, PySpark
>            Reporter: Xiangrui Meng
>            Assignee: Manoj Kumar
>              Labels: gsoc, gsoc2015, mentor
> This is an umbrella JIRA for [~MechCoder]'s GSoC 2015 project. The main theme is to enhance
MLlib's Python API, to make it on par with the Scala/Java API. The main tasks are:
> 1. For all models in MLlib, provide save/load method. This also
> includes save/load in Scala.
> 2. Python API for evaluation metrics.
> 3. Python API for streaming ML algorithms.
> 4. Python API for distributed linear algebra.
> 5. Simplify MLLibPythonAPI using DataFrames. Currently, we use
> customized serialization, making MLLibPythonAPI hard to maintain. It
> would be nice to use the DataFrames for serialization.
> I'll link the JIRAs for each of the tasks.
> Note that this doesn't mean all these JIRAs are pre-assigned to [~MechCoder]. The TODO
list will be dynamic based on the backlog.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message