flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3919) Distributed Linear Algebra: row-based matrix
Date Wed, 01 Jun 2016 06:54:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15309419#comment-15309419
] 

ASF GitHub Bot commented on FLINK-3919:
---------------------------------------

Github user chiwanpark commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1996#discussion_r65309988
  
    --- Diff: flink-libraries/flink-ml/src/main/scala/org/apache/flink/ml/math/distributed/DistributedRowMatrix.scala
---
    @@ -0,0 +1,179 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one
    + * or more contributor license agreements.  See the NOTICE file
    + * distributed with this work for additional information
    + * regarding copyright ownership.  The ASF licenses this file
    + * to you under the Apache License, Version 2.0 (the
    + * "License"); you may not use this file except in compliance
    + * with the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.flink.ml.math.distributed
    +
    +import breeze.linalg.{CSCMatrix => BreezeSparseMatrix, Matrix => BreezeMatrix,
Vector => BreezeVector}
    +import org.apache.flink.api.scala._
    +import org.apache.flink.ml.math.Breeze._
    +import org.apache.flink.ml.math.{Matrix => FlinkMatrix, _}
    +
    +/**
    +  * Distributed row-major matrix representation.
    +  * @param numRowsOpt If None, will be calculated from the DataSet.
    +  * @param numColsOpt If None, will be calculated from the DataSet.
    +  */
    +class DistributedRowMatrix(data: DataSet[IndexedRow],
    +                           numRowsOpt: Option[Int] = None,
    +                           numColsOpt: Option[Int] = None)
    +    extends DistributedMatrix {
    +
    +  lazy val getNumRows: Int = numRowsOpt match {
    +    case Some(rows) => rows
    +    case None => numRows.collect().head
    +  }
    +
    +  lazy val getNumCols: Int = numColsOpt match {
    +    case Some(cols) => cols
    +    case None => numCols.collect().head
    +  }
    +
    +  lazy val numRows: DataSet[Int] = numRowsOpt match {
    +    case Some(rows) => data.getExecutionEnvironment.fromElements(rows)
    +    case None => data.max("rowIndex").map(_.rowIndex + 1)
    +  }
    +
    +  lazy val numCols: DataSet[Int] = numColsOpt match {
    +    case Some(cols) => data.getExecutionEnvironment.fromElements(cols)
    +    case None => data.first(1).map(_.values.size)
    +  }
    +
    +  val getRowData = data
    +
    +  /**
    +    * Collects the data in the form of a sequence of coordinates associated with their
values.
    +    * @return
    +    */
    +  def toCOO: Seq[(Int, Int, Double)] = {
    +
    +    val localRows = data.collect()
    +
    +    for (IndexedRow(rowIndex, vector) <- localRows;
    +         (columnIndex, value) <- vector) yield (rowIndex, columnIndex, value)
    +  }
    +
    +  /**
    +    * Collects the data in the form of a SparseMatrix
    +    * @return
    +    */
    +  def toLocalSparseMatrix: SparseMatrix = {
    +    val localMatrix =
    +      SparseMatrix.fromCOO(this.getNumRows, this.getNumCols, this.toCOO)
    +    require(localMatrix.numRows == this.getNumRows)
    +    require(localMatrix.numCols == this.getNumCols)
    +    localMatrix
    +  }
    +
    +  //TODO: convert to dense representation on the distributed matrix and collect it afterward
    +  def toLocalDenseMatrix: DenseMatrix = this.toLocalSparseMatrix.toDenseMatrix
    +
    +  /**
    +    * Apply a high-order function to couple of rows
    +    * @param fun
    +    * @param other
    +    * @return
    +    */
    +  def byRowOperation(fun: (Vector, Vector) => Vector,
    +                     other: DistributedRowMatrix): DistributedRowMatrix = {
    +    val otherData = other.getRowData
    +    require(this.getNumCols == other.getNumCols)
    +    require(this.getNumRows == other.getNumRows)
    +
    +    val result = this.data
    +      .fullOuterJoin(otherData)
    +      .where("rowIndex")
    +      .equalTo("rowIndex")(
    +          (left: IndexedRow, right: IndexedRow) => {
    +            val row1 = Option(left).getOrElse(IndexedRow(
    +                    right.rowIndex,
    +                    SparseVector.fromCOO(right.values.size, List((0, 0.0)))))
    +            val row2 = Option(right).getOrElse(IndexedRow(
    +                    left.rowIndex,
    +                    SparseVector.fromCOO(left.values.size, List((0, 0.0)))))
    --- End diff --
    
    I would like to rewrite this block like following to avoid create unnecessary `IndexedRow`
object:
    
    ```scala
    val row1 = Option(left) match {
      case Some(row: IndexedRow) => row
      case None => IndexedRow(right.rowIndex, SparseVector.fromCOO(right.values.size, List((0,
0.0))))
    }
    val row2 = Option(right) match {
      case Some(row: IndexedRow) => row
      case None => IndexedRow(left.rowIndex, SparseVector.fromCOO(left.values.size, List((0,
0.0))))
    }
    ```
     


> Distributed Linear Algebra: row-based matrix
> --------------------------------------------
>
>                 Key: FLINK-3919
>                 URL: https://issues.apache.org/jira/browse/FLINK-3919
>             Project: Flink
>          Issue Type: New Feature
>          Components: Machine Learning Library
>            Reporter: Simone Robutti
>            Assignee: Simone Robutti
>
> Distributed matrix implementation as a DataSet of IndexedRow and related operations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message