spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tor Myklebust (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-1580) ALS: Estimate communication and computation costs given a partitioner
Date Wed, 23 Apr 2014 02:10:15 GMT
Tor Myklebust created SPARK-1580:
------------------------------------

             Summary: ALS: Estimate communication and computation costs given a partitioner
                 Key: SPARK-1580
                 URL: https://issues.apache.org/jira/browse/SPARK-1580
             Project: Spark
          Issue Type: Improvement
          Components: MLlib
            Reporter: Tor Myklebust
            Priority: Minor


It would be nice to be able to estimate the amount of work needed to solve an ALS problem.
 The chief components of this "work" are computation time---time spent forming and solving
the least squares problems---and communication cost---the number of bytes sent across the
network.  Communication cost depends heavily on how the users and products are partitioned.

We currently do not try to cluster users or products so that fewer feature vectors need to
be communicated.  This is intended as a first step toward that end---we ought to be able to
tell whether one partitioning is better than another.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message