spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: RMSE in MovieLensALS increases or stays stable as iterations increase.
Date Wed, 26 Nov 2014 17:53:00 GMT
I also modified the example to try 1, 5, 9, ... iterations as you did,
and also ran with the same default parameters. I used the
sample_movielens_data.txt file. Is that what you're using?

My result is:

Iteration 1 Test RMSE = 1.426079653593016 Train RMSE = 1.5013155094216357
Iteration 5 Test RMSE = 1.405598012724468 Train RMSE = 1.4847078708333596
Iteration 9 Test RMSE = 1.4055990901261632 Train RMSE = 1.484713206769993
Iteration 13 Test RMSE = 1.4055990999738366 Train RMSE = 1.4847132332994588
Iteration 17 Test RMSE = 1.40559910003368 Train RMSE = 1.48471323345531
Iteration 21 Test RMSE = 1.4055991000342158 Train RMSE = 1.4847132334567061
Iteration 25 Test RMSE = 1.4055991000342174 Train RMSE = 1.4847132334567108

Train error is higher than test error, consistently, which could be
underfitting. A higher rank=50 gets a reasonable result:

Iteration 1 Test RMSE = 1.5981883186995312 Train RMSE = 1.4841671360432005
Iteration 5 Test RMSE = 1.5745145659678204 Train RMSE = 1.4672341345080382
Iteration 9 Test RMSE = 1.5745147110505406 Train RMSE = 1.4672385714907996
Iteration 13 Test RMSE = 1.5745147108258577 Train RMSE = 1.4672385929631868
Iteration 17 Test RMSE = 1.5745147108246424 Train RMSE = 1.4672385930428344
Iteration 21 Test RMSE = 1.5745147108246367 Train RMSE = 1.4672385930431973
Iteration 25 Test RMSE = 1.5745147108246367 Train RMSE = 1.467238593043199

I'm not sure what the difference is. I looked at your modifications
and they seem very similar. Is it the data you're using?


On Wed, Nov 26, 2014 at 3:34 PM, Kostas Kloudas <kkloudas@gmail.com> wrote:
> For the training I am using the code in the MovieLensALS example with trainImplicit set
to false
> and for the training RMSE I use the
>
> val rmseTr = computeRmse(model, training, params.implicitPrefs).
>
> The computeRmse() method is provided in the MovieLensALS class.
>
>
> Thanks a lot,
> Kostas
>
>
>> On Nov 26, 2014, at 2:41 PM, Sean Owen <sowen@cloudera.com> wrote:
>>
>> How are you computing RMSE?
>> and how are you training the model -- not with trainImplicit right?
>> I wonder if you are somehow optimizing something besides RMSE.
>>
>> On Wed, Nov 26, 2014 at 2:36 PM, Kostas Kloudas <kkloudas@gmail.com> wrote:
>>> Once again, the error even with the training dataset increases. The results
>>> are:
>>>
>>> Running 1 iterations
>>> For 1 iter.: Test RMSE  = 1.2447121194304893  Training RMSE =
>>> 1.2394166987104076 (34.751317636 s).
>>> Running 5 iterations
>>> For 5 iter.: Test RMSE  = 1.3253957117600659  Training RMSE =
>>> 1.3206317416138509 (37.693118023000004 s).
>>> Running 9 iterations
>>> For 9 iter.: Test RMSE  = 1.3255293380139364  Training RMSE =
>>> 1.3207661218210436 (41.046175661 s).
>>> Running 13 iterations
>>> For 13 iter.: Test RMSE  = 1.3255295352665748  Training RMSE =
>>> 1.3207663201865092 (47.763619515 s).
>>> Running 17 iterations
>>> For 17 iter.: Test RMSE  = 1.32552953555787  Training RMSE =
>>> 1.3207663204794406 (59.682361103000005 s).
>>> Running 21 iterations
>>> For 21 iter.: Test RMSE  = 1.3255295355583026  Training RMSE =
>>> 1.3207663204798756 (57.210578232 s).
>>> Running 25 iterations
>>> For 25 iter.: Test RMSE  = 1.325529535558303  Training RMSE =
>>> 1.3207663204798765 (65.785485882 s).
>>>
>>> Thanks a lot,
>>> Kostas
>>>
>>> On Nov 26, 2014, at 12:04 PM, Nick Pentreath <nick.pentreath@gmail.com>
>>> wrote:
>>>
>>> copying user group - I keep replying directly vs reply all :)
>>>
>>> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <nick.pentreath@gmail.com>
>>> wrote:
>>>>
>>>> ALS will be guaranteed to decrease the squared error (therefore RMSE) in
>>>> each iteration, on the training set.
>>>>
>>>> This does not hold for the test set / cross validation. You would expect
>>>> the test set RMSE to stabilise as iterations increase, since the algorithm
>>>> converges - but not necessarily to decrease.
>>>>
>>>> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kkloudas@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> I am getting familiarized with Mllib and a thing I noticed is that
>>>>> running the MovieLensALS
>>>>> example on the movieLens dataset for increasing number of iterations
does
>>>>> not decrease the
>>>>> rmse.
>>>>>
>>>>> The results for 0.6% training set and 0.4% test are below. For training
>>>>> set to 0.8%, the results
>>>>> are almost identical. Shouldn’t it be normal to see a decreasing error?
>>>>> Especially going from 1 to 5 iterations.
>>>>>
>>>>> Running 1 iterations
>>>>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s).
>>>>> Running 5 iterations
>>>>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
>>>>> Running 9 iterations
>>>>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s).
>>>>> Running 13 iterations
>>>>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
>>>>> Running 17 iterations
>>>>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s).
>>>>> Running 21 iterations
>>>>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
>>>>> Running 25 iterations
>>>>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
>>>>> Running 29 iterations
>>>>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
>>>>>
>>>>> Thanks  a lot,
>>>>> Kostas
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>
>>>
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message