spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Kloudas <kklou...@gmail.com>
Subject Re: RMSE in MovieLensALS increases or stays stable as iterations increase.
Date Thu, 27 Nov 2014 11:04:35 GMT
Thanks a lot for your time guys and your quick replies!

> On Nov 26, 2014, at 7:53 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
> 
> The training RMSE may increase due to regularization. Squared loss
> only represents part of the global loss. If you watch the sum of the
> squared loss and the regularization, it should be non-increasing.
> -Xiangrui
> 
> On Wed, Nov 26, 2014 at 9:53 AM, Sean Owen <sowen@cloudera.com> wrote:
>> I also modified the example to try 1, 5, 9, ... iterations as you did,
>> and also ran with the same default parameters. I used the
>> sample_movielens_data.txt file. Is that what you're using?
>> 
>> My result is:
>> 
>> Iteration 1 Test RMSE = 1.426079653593016 Train RMSE = 1.5013155094216357
>> Iteration 5 Test RMSE = 1.405598012724468 Train RMSE = 1.4847078708333596
>> Iteration 9 Test RMSE = 1.4055990901261632 Train RMSE = 1.484713206769993
>> Iteration 13 Test RMSE = 1.4055990999738366 Train RMSE = 1.4847132332994588
>> Iteration 17 Test RMSE = 1.40559910003368 Train RMSE = 1.48471323345531
>> Iteration 21 Test RMSE = 1.4055991000342158 Train RMSE = 1.4847132334567061
>> Iteration 25 Test RMSE = 1.4055991000342174 Train RMSE = 1.4847132334567108
>> 
>> Train error is higher than test error, consistently, which could be
>> underfitting. A higher rank=50 gets a reasonable result:
>> 
>> Iteration 1 Test RMSE = 1.5981883186995312 Train RMSE = 1.4841671360432005
>> Iteration 5 Test RMSE = 1.5745145659678204 Train RMSE = 1.4672341345080382
>> Iteration 9 Test RMSE = 1.5745147110505406 Train RMSE = 1.4672385714907996
>> Iteration 13 Test RMSE = 1.5745147108258577 Train RMSE = 1.4672385929631868
>> Iteration 17 Test RMSE = 1.5745147108246424 Train RMSE = 1.4672385930428344
>> Iteration 21 Test RMSE = 1.5745147108246367 Train RMSE = 1.4672385930431973
>> Iteration 25 Test RMSE = 1.5745147108246367 Train RMSE = 1.467238593043199
>> 
>> I'm not sure what the difference is. I looked at your modifications
>> and they seem very similar. Is it the data you're using?
>> 
>> 
>> On Wed, Nov 26, 2014 at 3:34 PM, Kostas Kloudas <kkloudas@gmail.com> wrote:
>>> For the training I am using the code in the MovieLensALS example with trainImplicit
set to false
>>> and for the training RMSE I use the
>>> 
>>> val rmseTr = computeRmse(model, training, params.implicitPrefs).
>>> 
>>> The computeRmse() method is provided in the MovieLensALS class.
>>> 
>>> 
>>> Thanks a lot,
>>> Kostas
>>> 
>>> 
>>>> On Nov 26, 2014, at 2:41 PM, Sean Owen <sowen@cloudera.com> wrote:
>>>> 
>>>> How are you computing RMSE?
>>>> and how are you training the model -- not with trainImplicit right?
>>>> I wonder if you are somehow optimizing something besides RMSE.
>>>> 
>>>> On Wed, Nov 26, 2014 at 2:36 PM, Kostas Kloudas <kkloudas@gmail.com>
wrote:
>>>>> Once again, the error even with the training dataset increases. The results
>>>>> are:
>>>>> 
>>>>> Running 1 iterations
>>>>> For 1 iter.: Test RMSE  = 1.2447121194304893  Training RMSE =
>>>>> 1.2394166987104076 (34.751317636 s).
>>>>> Running 5 iterations
>>>>> For 5 iter.: Test RMSE  = 1.3253957117600659  Training RMSE =
>>>>> 1.3206317416138509 (37.693118023000004 s).
>>>>> Running 9 iterations
>>>>> For 9 iter.: Test RMSE  = 1.3255293380139364  Training RMSE =
>>>>> 1.3207661218210436 (41.046175661 s).
>>>>> Running 13 iterations
>>>>> For 13 iter.: Test RMSE  = 1.3255295352665748  Training RMSE =
>>>>> 1.3207663201865092 (47.763619515 s).
>>>>> Running 17 iterations
>>>>> For 17 iter.: Test RMSE  = 1.32552953555787  Training RMSE =
>>>>> 1.3207663204794406 (59.682361103000005 s).
>>>>> Running 21 iterations
>>>>> For 21 iter.: Test RMSE  = 1.3255295355583026  Training RMSE =
>>>>> 1.3207663204798756 (57.210578232 s).
>>>>> Running 25 iterations
>>>>> For 25 iter.: Test RMSE  = 1.325529535558303  Training RMSE =
>>>>> 1.3207663204798765 (65.785485882 s).
>>>>> 
>>>>> Thanks a lot,
>>>>> Kostas
>>>>> 
>>>>> On Nov 26, 2014, at 12:04 PM, Nick Pentreath <nick.pentreath@gmail.com>
>>>>> wrote:
>>>>> 
>>>>> copying user group - I keep replying directly vs reply all :)
>>>>> 
>>>>> On Wed, Nov 26, 2014 at 2:03 PM, Nick Pentreath <nick.pentreath@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> ALS will be guaranteed to decrease the squared error (therefore RMSE)
in
>>>>>> each iteration, on the training set.
>>>>>> 
>>>>>> This does not hold for the test set / cross validation. You would
expect
>>>>>> the test set RMSE to stabilise as iterations increase, since the
algorithm
>>>>>> converges - but not necessarily to decrease.
>>>>>> 
>>>>>> On Wed, Nov 26, 2014 at 1:57 PM, Kostas Kloudas <kkloudas@gmail.com>
>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I am getting familiarized with Mllib and a thing I noticed is
that
>>>>>>> running the MovieLensALS
>>>>>>> example on the movieLens dataset for increasing number of iterations
does
>>>>>>> not decrease the
>>>>>>> rmse.
>>>>>>> 
>>>>>>> The results for 0.6% training set and 0.4% test are below. For
training
>>>>>>> set to 0.8%, the results
>>>>>>> are almost identical. Shouldn’t it be normal to see a decreasing
error?
>>>>>>> Especially going from 1 to 5 iterations.
>>>>>>> 
>>>>>>> Running 1 iterations
>>>>>>> Test RMSE for 1 iter. = 1.2452964343277886 (52.757125927000004
s).
>>>>>>> Running 5 iterations
>>>>>>> Test RMSE for 5 iter. = 1.3258973764470259 (61.183927666 s).
>>>>>>> Running 9 iterations
>>>>>>> Test RMSE for 9 iter. = 1.3260308117704385 (61.84948875800001
s).
>>>>>>> Running 13 iterations
>>>>>>> Test RMSE for 13 iter. = 1.3260310099809915 (73.799510125 s).
>>>>>>> Running 17 iterations
>>>>>>> Test RMSE for 17 iter. = 1.3260310102735398 (77.56512185300001
s).
>>>>>>> Running 21 iterations
>>>>>>> Test RMSE for 21 iter. = 1.3260310102739703 (79.607495074 s).
>>>>>>> Running 25 iterations
>>>>>>> Test RMSE for 25 iter. = 1.326031010273971 (88.631776301 s).
>>>>>>> Running 29 iterations
>>>>>>> Test RMSE for 29 iter. = 1.3260310102739712 (101.178383079 s).
>>>>>>> 
>>>>>>> Thanks  a lot,
>>>>>>> Kostas
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message