spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiangrui Meng <men...@gmail.com>
Subject Re: RMSE in MovieLensALS increases or stays stable as iterations increase.
Date Wed, 26 Nov 2014 19:53:48 GMT
The training RMSE may increase due to regularization. Squared loss
only represents part of the global loss. If you watch the sum of the
squared loss and the regularization, it should be non-increasing.
-Xiangrui

On Wed, Nov 26, 2014 at 9:53 AM, Sean Owen <sowen@cloudera.com> wrote:
> I also modified the example to try 1, 5, 9, ... iterations as you did,
> and also ran with the same default parameters. I used the
> sample_movielens_data.txt file. Is that what you're using?
>
> My result is:
>
> Iteration 1 Test RMSE = 1.426079653593016 Train RMSE = 1.5013155094216357
> Iteration 5 Test RMSE = 1.405598012724468 Train RMSE = 1.4847078708333596
> Iteration 9 Test RMSE = 1.4055990901261632 Train RMSE = 1.484713206769993
> Iteration 13 Test RMSE = 1.4055990999738366 Train RMSE = 1.4847132332994588
> Iteration 17 Test RMSE = 1.40559910003368 Train RMSE = 1.48471323345531
> Iteration 21 Test RMSE = 1.4055991000342158 Train RMSE = 1.4847132334567061
> Iteration 25 Test RMSE = 1.4055991000342174 Train RMSE = 1.4847132334567108
>
> Train error is higher than test error, consistently, which could be
> underfitting. A higher rank=50 gets a reasonable result:
>
> Iteration 1 Test RMSE = 1.5981883186995312 Train RMSE = 1.4841671360432005
> Iteration 5 Test RMSE = 1.5745145659678204 Train RMSE = 1.4672341345080382
> Iteration 9 Test RMSE = 1.5745147110505406 Train RMSE = 1.4672385714907996
> Iteration 13 Test RMSE = 1.5745147108258577 Train RMSE = 1.4672385929631868
> Iteration 17 Test RMSE = 1.5745147108246424 Train RMSE = 1.4672385930428344
> Iteration 21 Test RMSE = 1.5745147108246367 Train RMSE = 1.4672385930431973
> Iteration 25 Test RMSE = 1.5745147108246367 Train RMSE = 1.467238593043199
>
> I'm not sure what the difference is. I looked at your modifications
> and they seem very similar. Is it the data you're using?
>
>
> On Wed, Nov 26, 2014 at 3:34 PM, Kostas Kloudas <kkloudas@gmail.com> wrote:
>> For the training I am using the code in the MovieLensALS example with trainImplicit
set to false
>> and for the training RMSE I use the
>>
>> val rmseTr = computeRmse(model, training, params.implicitPrefs).
>>
>> The computeRmse() method is provided in the MovieLensALS class.
>>
>>
>> Thanks a lot,
>> Kostas
>>
>>
>>> On Nov 26, 2014, at 2:41 PM, Sean Owen <sowen@cloudera.com> wrote:
>>>
>>> How are you computing RMSE?
>>> and how are you training the model -- not with trainImplicit right?
>>> I wonder if you are somehow optimizing something besides RMSE.
>>>
>>> On Wed, Nov 26, 2014 at 2:36 PM, Kostas Kloudas <kkloudas@gmail.com> wrote:
>>>> Once again, the error even with the training dataset increases. The results
>>>> are:
>>>>
>>>> Running 1 iterations
>>>> For 1 iter.: Test RMSE  = 1.2447121194304893  Training RMSE =
>>>> 1.2394166987104076 (34.751317636 s).
>>>> Running 5 iterations
>>>> For 5 iter.: Test RMSE  = 1.3253957117600659  Training RMSE =
>>>> 1.3206317416138509 (37.693118023000004 s).
>>>> Running 9 iterations
>>>> For 9 iter.: Test RMSE  = 1.3255293380139364  Training RMSE =
>>>> 1.3207661218210436 (41.046175661 s).
>>>> Running 13 iterations
>>>> For 13 iter.: Test RMSE  = 1.3255295352665748  Training RMSE =
>>>> 1.3207663201865092 (47.763619515 s).
>>>> Running 17 iterations
>>>> For 17 iter.: Test RMSE  = 1.32552953555787  Training RMSE =
>>>> 1.3207663204794406 (59.682361103000005 s).
>>>> Running 21 iterations
>>>> For 21 iter.: Test RMSE  = 1.3255295355583026  Training RMSE =
>>>> 1.3207663204798756 (57.210578232 s).
>>>> Running 25 iterations
>>>> For 25 iter.: Test RMSE  = 1.325529535558303  Training RMSE =
>>>> 1.3207663204798765 (65.785485882 s).
>>>>
>>>> Thanks a lot,
>>>> Kostas
>>>>
>>>> On Nov 26, 2014, at 12:04 PM, Nick Pentreath <nick.pentreath@gmail.com>
>>>> wrote:
>>>>
>>>> copying user group - I keep replying directly vs reply all :)
>>>>
>>>> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <nick.pentreath@gmail.com>
>>>> wrote:
>>>>>
>>>>> ALS will be guaranteed to decrease the squared error (therefore RMSE)
in
>>>>> each iteration, on the training set.
>>>>>
>>>>> This does not hold for the test set / cross validation. You would expect
>>>>> the test set RMSE to stabilise as iterations increase, since the algorithm
>>>>> converges - but not necessarily to decrease.
>>>>>
>>>>> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kkloudas@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I am getting familiarized with Mllib and a thing I noticed is that
>>>>>> running the MovieLensALS
>>>>>> example on the movieLens dataset for increasing number of iterations
does
>>>>>> not decrease the
>>>>>> rmse.
>>>>>>
>>>>>> The results for 0.6% training set and 0.4% test are below. For training
>>>>>> set to 0.8%, the results
>>>>>> are almost identical. Shouldn’t it be normal to see a decreasing
error?
>>>>>> Especially going from 1 to 5 iterations.
>>>>>>
>>>>>> Running 1 iterations
>>>>>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004 s).
>>>>>> Running 5 iterations
>>>>>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
>>>>>> Running 9 iterations
>>>>>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001 s).
>>>>>> Running 13 iterations
>>>>>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
>>>>>> Running 17 iterations
>>>>>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001 s).
>>>>>> Running 21 iterations
>>>>>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
>>>>>> Running 25 iterations
>>>>>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
>>>>>> Running 29 iterations
>>>>>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
>>>>>>
>>>>>> Thanks  a lot,
>>>>>> Kostas
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>
>>>>>
>>>>
>>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message