spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lisendong <lisend...@163.com>
Subject Re: different result from implicit ALS with explicit ALS
Date Mon, 30 Mar 2015 14:27:20 GMT
hi, xiangrui:
I found the ALS of spark 1.3.0 forget to do checkpoint() in explicit ALS:
the code is :
https://github.com/apache/spark/blob/db34690466d67f9c8ac6a145fddb5f7ea30a8d8d/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala
<https://github.com/apache/spark/blob/db34690466d67f9c8ac6a145fddb5f7ea30a8d8d/mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala>


the checkpoint is very important in my situation, because my task will produce 1TB shuffle
data in each iteration, it the shuffle data is not deleted in each iteration(using checkpoint()),
the task will produce 30TB data…


So I change the ALS code, and re-compile by myself, but it seems the checkpoint does not take
effects, and the task still occupy 30TB disk… ( I only add two lines to the ALS.scala) :





and the driver’s log seems strange, why the log is printed together...


thank you very much!


> 在 2015年2月26日,下午11:33,163 <lisendong@163.com> 写道:
> 
> Thank you very much for your opinion:)
> 
> In our case, maybe it 's dangerous to treat un-observed item as negative interaction(although
we could give them small confidence, I think they are still incredible...)
> 
> I will do more experiments and give you feedback:)
> 
> Thank you;)
> 
> 
>> 在 2015年2月26日,23:16,Sean Owen <sowen@cloudera.com> 写道:
>> 
>> I believe that's right, and is what I was getting at. yes the implicit
>> formulation ends up implicitly including every possible interaction in
>> its loss function, even unobserved ones. That could be the difference.
>> 
>> This is mostly an academic question though. In practice, you have
>> click-like data and should be using the implicit version for sure.
>> 
>> However you can give negative implicit feedback to the model. You
>> could consider no-click as a mild, observed, negative interaction.
>> That is: supply a small negative value for these cases. Unobserved
>> pairs are not part of the data set. I'd be careful about assuming the
>> lack of an action carries signal.
>> 
>>> On Thu, Feb 26, 2015 at 3:07 PM, 163 <lisendong@163.com> wrote:
>>> oh my god, I think I understood...
>>> In my case, there are three kinds of user-item pairs:
>>> 
>>> Display and click pair(positive pair)
>>> Display but no-click pair(negative pair)
>>> No-display pair(unobserved pair)
>>> 
>>> Explicit ALS only consider the first and the second kinds
>>> But implicit ALS consider all the three kinds of pair(and consider the third
>>> kind as the second pair, because their preference value are all zero and
>>> confidence are all 1)
>>> 
>>> So the result are different. right?
>>> 
>>> Could you please give me some advice, which ALS should I use?
>>> If I use the implicit ALS, how to distinguish the second and the third kind
>>> of pair:)
>>> 
>>> My opinion is in my case, I should use explicit ALS ...
>>> 
>>> Thank you so much
>>> 
>>> 在 2015年2月26日,22:41,Xiangrui Meng <meng@databricks.com> 写道:
>>> 
>>> Lisen, did you use all m-by-n pairs during training? Implicit model
>>> penalizes unobserved ratings, while explicit model doesn't. -Xiangrui
>>> 
>>>> On Feb 26, 2015 6:26 AM, "Sean Owen" <sowen@cloudera.com> wrote:
>>>> 
>>>> +user
>>>> 
>>>>> On Thu, Feb 26, 2015 at 2:26 PM, Sean Owen <sowen@cloudera.com>
wrote:
>>>>> 
>>>>> I think I may have it backwards, and that you are correct to keep the
0
>>>>> elements in train() in order to try to reproduce the same result.
>>>>> 
>>>>> The second formulation is called 'weighted regularization' and is used
>>>>> for both implicit and explicit feedback, as far as I can see in the code.
>>>>> 
>>>>> Hm, I'm actually not clear why these would produce different results.
>>>>> Different code paths are used to be sure, but I'm not yet sure why they
>>>>> would give different results.
>>>>> 
>>>>> In general you wouldn't use train() for data like this though, and would
>>>>> never set alpha=0.
>>>>> 
>>>>>> On Thu, Feb 26, 2015 at 2:15 PM, lisendong <lisendong@163.com>
wrote:
>>>>>> 
>>>>>> I want to confirm the loss function you use (sorry I’m not so familiar
>>>>>> with scala code so I did not understand the source code of mllib)
>>>>>> 
>>>>>> According to the papers :
>>>>>> 
>>>>>> 
>>>>>> in your implicit feedback ALS, the loss function is (ICDM 2008):
>>>>>> 
>>>>>> in the explicit feedback ALS, the loss function is (Netflix 2008):
>>>>>> 
>>>>>> note that besides the difference of confidence parameter Cui, the
>>>>>> regularization is also different.  does your code also has this difference?
>>>>>> 
>>>>>> Best Regards,
>>>>>> Sendong Li
>>>>>> 
>>>>>> 
>>>>>>> 在 2015年2月26日,下午9:42,lisendong <lisendong@163.com>
写道:
>>>>>>> 
>>>>>>> Hi meng, fotero, sowen:
>>>>>>> 
>>>>>>> I’m using ALS with spark 1.0.0, the code should be:
>>>>>>> 
>>>>>>> https://github.com/apache/spark/blob/branch-1.0/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
>>>>>>> 
>>>>>>> I think the following two method should produce the same (or
near)
>>>>>>> result:
>>>>>>> 
>>>>>>> MatrixFactorizationModel model = ALS.train(ratings.rdd(), 30,
30, 0.01,
>>>>>>> -1, 1);
>>>>>>> 
>>>>>>> MatrixFactorizationModel model = ALS.trainImplicit(ratings.rdd(),
30,
>>>>>>> 30, 0.01, -1, 0, 1);
>>>>>>> 
>>>>>>> the data I used is display log, the format of log is as following:
>>>>>>> 
>>>>>>> user  item  if-click
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I use 1.0 as score for click pair, and 0 as score for non-click
pair.
>>>>>>> 
>>>>>>> in the second method, the alpha is set to zero, so the confidence
for
>>>>>>> positive and negative are both 1.0 (right?)
>>>>>>> 
>>>>>>> I think the two method should produce similar result, but the
result is
>>>>>>> :  the second method’s result is very bad (the AUC of the first
result is
>>>>>>> 0.7, but the AUC of the second result is only 0.61)
>>>>>>> 
>>>>>>> 
>>>>>>> I could not understand why, could you help me?
>>>>>>> 
>>>>>>> 
>>>>>>> Thank you very much!
>>>>>>> 
>>>>>>> Best Regards,
>>>>>>> Sendong Li
>>>>>> 
>>>>>> 
>>>>> 
>>>> 


Mime
View raw message