spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath (JIRA)" <>
Subject [jira] [Commented] (SPARK-6567) Large linear model parallelism via a join and reduceByKey
Date Fri, 24 Feb 2017 08:05:44 GMT


Nick Pentreath commented on SPARK-6567:

This JIRA has been around for a while without any movement. I think generally it seems that
the "vector-free" versions of algorithms such as L-BFGS (see
will be generally more efficient.

Shall we close this (unless there are major objections)?

> Large linear model parallelism via a join and reduceByKey
> ---------------------------------------------------------
>                 Key: SPARK-6567
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>            Reporter: Reza Zadeh
>         Attachments: model-parallelism.pptx
> To train a linear model, each training point in the training set needs its dot product
computed against the model, per iteration. If the model is large (too large to fit in memory
on a single machine) then SPARK-4590 proposes using parameter server.
> There is an easier way to achieve this without parameter servers. In particular, if the
data is held as a BlockMatrix and the model as an RDD, then each block can be joined with
the relevant part of the model, followed by a reduceByKey to compute the dot products.
> This obviates the need for a parameter server, at least for linear models. However, it's
unclear how it compares performance-wise to parameter servers.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message