spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Gopalakrishnan <dgk...@gmail.com>
Subject Re: Running ALS on comparitively large RDD
Date Fri, 11 Mar 2016 07:21:04 GMT
1. I'm using about 1 million users against few thousand products. I
basically have around a million ratings
2. Spark 1.6 on Amazon EMR

On Fri, Mar 11, 2016 at 12:46 PM, Nick Pentreath <nick.pentreath@gmail.com>
wrote:

> Could you provide more details about:
> 1. Data set size (# ratings, # users and # products)
> 2. Spark cluster set up and version
>
> Thanks
>
> On Fri, 11 Mar 2016 at 05:53 Deepak Gopalakrishnan <dgkris@gmail.com>
> wrote:
>
>> Hello All,
>>
>> I've been running Spark's ALS on a dataset of users and rated items. I
>> first encode my users to integers by using an auto increment function (
>> just like zipWithIndex), I do the same for my items. I then create an RDD
>> of the ratings and feed it to ALS.
>>
>> My issue is that the ALS algorithm never completes. Attached is a
>> screenshot of the stages window.
>>
>> Any help will be greatly appreciated
>>
>> --
>> Regards,
>> *Deepak Gopalakrishnan*
>> *Mobile*:+918891509774
>> *Skype* : deepakgk87
>> http://myexps.blogspot.com
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Regards,
*Deepak Gopalakrishnan*
*Mobile*:+918891509774
*Skype* : deepakgk87
http://myexps.blogspot.com

Mime
View raw message