spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Patrick Wendell (JIRA)" <>
Subject [jira] [Resolved] (SPARK-1390) Refactor RDD backed matrices
Date Wed, 09 Apr 2014 06:03:18 GMT


Patrick Wendell resolved SPARK-1390.

    Resolution: Fixed

> Refactor RDD backed matrices
> ----------------------------
>                 Key: SPARK-1390
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Blocker
>             Fix For: 1.0.0
> The current interfaces of RDD backed matrices needs refactoring for v1.0 release. It
would be better if we have a clear separation of local matrices and those backed by RDD. Right
now, we have 
> 1. org.apache.spark.mllib.linalg.SparseMatrix, which is a wrapper over an RDD of matrix
entries, i.e., coordinate list format.
> 2. org.apache.spark.mllib.linalg.TallSkinnyDenseMatrix, which is a wrapper over RDD[Array[Double]],
i.e. row-oriented format.
> We will see naming collision when we introduce local SparseMatrix and the name TallSkinnyDenseMatrix
is not exact if we switch to RDD[Vector] instead of RDD[Array[Double]]. It would be better
to have "RDD" in the type name to suggest that operations will trigger a job.
> The proposed names (all under org.apache.spark.mllib.linalg.rdd):
> 1. RDDMatrix: trait for matrices backed by one or more RDDs
> 2. CoordinateRDDMatrix: wrapper of RDD[RDDMatrixEntry]
> 3. RowRDDMatrix: wrapper of RDD[Vector] whose rows do not have special ordering
> 4. IndexedRowRDDMatrix: wrapper of RDD[(Long, Vector)] whose rows are associated with
> The proposal is subject to charge, but it would be nice to make the changes before v1.0.

This message was sent by Atlassian JIRA

View raw message