spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hiroyuki Yamada <mogwa...@gmail.com>
Subject Re: What is the most efficient and scalable way to get all the recommendation results from ALS model ?
Date Mon, 21 Mar 2016 04:22:39 GMT
Could anyone give me some advices or recommendations or usual ways to do
this ?

I am trying to get all (probably top 100) product recommendations for each
user from a model (MatrixFactorizationModel),
but I haven't figured out yet to do it efficiently.

So far,
calling predict (predictAll in pyspark) method with user-product matrix
uses too much memory and couldn't complete due to a lack of memory,
and
calling predict for each user (or for each some users like 100 uses or so)
takes too much time to get all the recommendations.

I am using spark 1.4.1 and running 5-node cluster with 8GB RAM each.
I only use small-sized data set so far, like about 50000 users and 5000
products with only about 100000 ratings.

Thanks.


On Sat, Mar 19, 2016 at 7:58 PM, Hiroyuki Yamada <mogwaing@gmail.com> wrote:

> Hi,
>
> I'm testing Collaborative Filtering with Milib.
> Making a model by ALS.trainImplicit (or train) seems scalable as far as I
> have tested,
> but I'm wondering how I can get all the recommendation results efficiently.
>
> The predictAll method can get all the results,
> but it needs the whole user-product matrix in memory as an input.
> So if there are 1 million users and 1 million products, then the number of
> elements is too large (1 million x 1 million)
> and the amount of memory to hold them is more than a few TB even when the
> element size in only 4B,
> which is not a realistic size of memory even now.
>
> # (1000000*1000000)*4/1000/1000/1000/1000 => near equals 4TB)
>
> We can, of course, use predict method per user,
> but, as far as I tried, it is very slow to get 1 million users' results.
>
> Do I miss something ?
> Are there any other better ways to get all the recommendation results in
> scalable and efficient way ?
>
> Best regards,
> Hiro
>
>
>

Mime
View raw message