spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <>
Subject Re: Load whole ALS MatrixFactorizationModel into memory
Date Wed, 02 Nov 2016 17:39:53 GMT
You can cause the underlying RDDs in the model to be cached in memory. That
would be necessary but not sufficient to make it go fast; it should at
least get rid of a lot of I/O. I think making recommendations one at a time
is never going to scale to moderate load this way; one request means one
entire job to schedule with multiple tasks. Fine for the occasional query
or smallish data, but not a thousand queries per second. For that I think
you'd have to build some custom scoring infrastructure. At least, that's
what I did, so I would say that.

On Wed, Nov 2, 2016 at 4:54 PM Mikael Ståldal <>

> import org.apache.spark.mllib.recommendation.ALS
> import org.apache.spark.mllib.recommendation.MatrixFactorizationModel
> I build a MatrixFactorizationModel with ALS.trainImplicit(), then I save
> it with its save method.
> Later, in an other process on another machine, I load the model with
> MatrixFactorizationModel.load(). Now I want to make a lot of
> recommendProducts() calls on the loaded model, and I want them to be quick,
> without any I/O. However, they are slow and seem to to I/O each time.
> Is there any way to force loading the whole model into memory (that step
> can take some time and do I/O) and then be able to do recommendProducts()
> on it multiple times, quickly without I/O?
> --
> [image: MagineTV]
> *Mikael Ståldal*
> Senior software developer
> *Magine TV*
> Grev Turegatan 3  | 114 46 Stockholm, Sweden  |
> Privileged and/or Confidential Information may be contained in this
> message. If you are not the addressee indicated in this message
> (or responsible for delivery of the message to such a person), you may not
> copy or deliver this message to anyone. In such case,
> you should destroy this message and kindly notify the sender by reply
> email.

View raw message