spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hiroyuki Yamada <>
Subject Re: What is the most efficient and scalable way to get all the recommendation results from ALS model ?
Date Mon, 21 Mar 2016 04:22:39 GMT
Could anyone give me some advices or recommendations or usual ways to do
this ?

I am trying to get all (probably top 100) product recommendations for each
user from a model (MatrixFactorizationModel),
but I haven't figured out yet to do it efficiently.

So far,
calling predict (predictAll in pyspark) method with user-product matrix
uses too much memory and couldn't complete due to a lack of memory,
calling predict for each user (or for each some users like 100 uses or so)
takes too much time to get all the recommendations.

I am using spark 1.4.1 and running 5-node cluster with 8GB RAM each.
I only use small-sized data set so far, like about 50000 users and 5000
products with only about 100000 ratings.


On Sat, Mar 19, 2016 at 7:58 PM, Hiroyuki Yamada <> wrote:

> Hi,
> I'm testing Collaborative Filtering with Milib.
> Making a model by ALS.trainImplicit (or train) seems scalable as far as I
> have tested,
> but I'm wondering how I can get all the recommendation results efficiently.
> The predictAll method can get all the results,
> but it needs the whole user-product matrix in memory as an input.
> So if there are 1 million users and 1 million products, then the number of
> elements is too large (1 million x 1 million)
> and the amount of memory to hold them is more than a few TB even when the
> element size in only 4B,
> which is not a realistic size of memory even now.
> # (1000000*1000000)*4/1000/1000/1000/1000 => near equals 4TB)
> We can, of course, use predict method per user,
> but, as far as I tried, it is very slow to get 1 million users' results.
> Do I miss something ?
> Are there any other better ways to get all the recommendation results in
> scalable and efficient way ?
> Best regards,
> Hiro

View raw message