spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: ALS.trainImplicit block sizes
Date Fri, 21 Oct 2016 07:15:34 GMT
How many nodes are you using in the cluster?



On Fri, 21 Oct 2016 at 08:58 Nikhil Mishra <nikhilmishra8080@gmail.com>
wrote:

> Thanks Nick.
>
> So we do partition U x I matrix into BxB matrices, each of size around U/B
> and I/B. Is that correct? Do you know whether a single block of the matrix
> is represented in memory as a full matrix or as sparse matrix? I ask this
> because my job has been failing for block sizes which should have worked.
>
> I have U = 85 million users, I = 250,000 items and when I specify block
> size 5,000, I get out of memory error, even though I am setting
> --executor-memory as 7g (on a linux EC2 which has 7.5g memory). Assuming
> each block has 17000 users and 50 items, eve if the block is internally
> represented as a full matrix, it should still occupy around 50MB space.
>
> Increasing block size to 20,000 also results in the same. So there is
> something I don't understand about how this is working.
>
> BTW, I am trying to find 50 latent factors (rank = 50).
>
> Do you have some insights as to how I should tweak things to get this
> working?
>
> Thanks,
> Nik
>
> On Thu, Oct 20, 2016 at 11:43 PM, Nick Pentreath <nick.pentreath@gmail.com
> > wrote:
>
> The blocks params will set both user and item blocks.
>
> Spark 2.0 supports user and item blocks for PySpark:
> http://spark.apache.org/docs/latest/api/python/pyspark.ml.html#module-pyspark.ml.recommendation
>
>
> On Fri, 21 Oct 2016 at 08:12 Nikhil Mishra <nikhilmishra8080@gmail.com>
> wrote:
>
> Hi,
>
> I have a question about the block size to be specified in
> ALS.trainImplicit() in pyspark (Spark 1.6.1). There is only one block size
> parameter to be specified. I want to know if that would result in
> partitioning both the users as well as the items axes.
>
> For example, I am using the following call to ALs.trainImplicit() in my
> code.
>
> ---------------
>
> RANK = 50
>
> ITERATIONS = 2
>
> BLOCKS = 1000
>
> ALPHA = 1.0
>
> model = ALS.trainImplicit(ratings, RANK, ITERATIONS, blocks=BLOCKS,
> alpha=ALPHA)
>
>
> ----------------
>
> Will this partition the users x items matrix into BLOCKS x BLOCKS number
> of matrices or will it partition only the users axis thereby resulting in
> BLOCKS number of matrices, each with columns = total number of unique items?
>
> Thanks,
> Nik
>
>
>

Mime
View raw message