spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hiroyuki Yamada <mogwa...@gmail.com>
Subject What is the most efficient and scalable way to get all the recommendation results from ALS model ?
Date Sat, 19 Mar 2016 10:58:41 GMT
Hi,

I'm testing Collaborative Filtering with Milib.
Making a model by ALS.trainImplicit (or train) seems scalable as far as I
have tested,
but I'm wondering how I can get all the recommendation results efficiently.

The predictAll method can get all the results,
but it needs the whole user-product matrix in memory as an input.
So if there are 1 million users and 1 million products, then the number of
elements is too large (1 million x 1 million)
and the amount of memory to hold them is more than a few TB even when the
element size in only 4B,
which is not a realistic size of memory even now.

# (1000000*1000000)*4/1000/1000/1000/1000 => near equals 4TB)

We can, of course, use predict method per user,
but, as far as I tried, it is very slow to get 1 million users' results.

Do I miss something ?
Are there any other better ways to get all the recommendation results in
scalable and efficient way ?

Best regards,
Hiro

Mime
View raw message