spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish Tripathi <tr.man...@gmail.com>
Subject Re: Negative values of predictions in ALS.tranform
Date Thu, 15 Dec 2016 23:43:12 GMT
when you say *implicit ALS *is* factoring the 0/1 matrix. , are you saying
for implicit feedback algorithm we need to pass the input data as the
preference matrix i.e a matrix of 0 and 1?. *

Then how will they calculate the confidence matrix which is basically
=1+alpha*count matrix. If we don't pass the actual count of values (views
etc) then how does Spark calculates the confidence matrix?.

I was of the understanding that input data for als.fit(implicitPref=True)
is the actual count matrix of the views/purchases?. Am I going wrong here
if yes, then how is Spark calculating the confidence matrix if it doesn't
have the actual count data.

The original paper on which Spark algo is based needs the actual count data
to create a confidence matrix and also needs the 0/1 matrix since the
objective functions uses both the confidence matrix and 0/1 matrix to find
the user and item factors.
ᐧ

On Thu, Dec 15, 2016 at 3:38 PM, Sean Owen <sowen@cloudera.com> wrote:

> No, you can't interpret the output as probabilities at all. In particular
> they may be negative. It is not predicting rating but interaction. Negative
> means very strongly not predicted to interact. No, implicit ALS *is*
> factoring the 0/1 matrix.
>
> On Thu, Dec 15, 2016, 23:31 Manish Tripathi <tr.manish@gmail.com> wrote:
>
>> Ok. So we can kind of interpret the output as probabilities even though
>> it is not modeling probabilities. This is to be able to use it for
>> binaryclassification evaluator.
>>
>> So the way I understand is and as per the algo, the predicted matrix is
>> basically a dot product of user factor and item factor matrix.
>>
>> but in what circumstances the ratings predicted can be negative. I can
>> understand if the individual user factor vector and item factor vector is
>> having negative factor terms, then it can be negative. But practically does
>> negative make any sense? AS per algorithm the dot product is the predicted
>> rating. So rating shouldnt be negative for it to make any sense. Also
>> rating just between 0-1 is normalised rating? Typically rating we expect to
>> be like any real value 2.3,4.5 etc.
>>
>> Also please note, for implicit feedback ALS, we don't feed 0/1 matrix. We
>> feed the count matrix (discrete count values) and am assuming spark
>> internally converts it into a preference matrix (1/0) and a confidence
>> matrix =1+alpha*count_matrix
>>
>>
>>
>>
>> ᐧ
>>
>> On Thu, Dec 15, 2016 at 2:56 PM, Sean Owen <sowen@cloudera.com> wrote:
>>
>> No, ALS is not modeling probabilities. The outputs are reconstructions of
>> a 0/1 matrix. Most values will be in [0,1], but, it's possible to get
>> values outside that range.
>>
>> On Thu, Dec 15, 2016 at 10:21 PM Manish Tripathi <tr.manish@gmail.com>
>> wrote:
>>
>> Hi
>>
>> ran the ALS model for implicit feedback thing. Then I used the .transform
>> method of the model to predict the ratings for the original dataset. My
>> dataset is of the form (user,item,rating)
>>
>> I see something like below:
>>
>> predictions.show(5,truncate=False)
>>
>>
>> Why is the last prediction value negative ?. Isn't the transform method
>> giving the prediction(probability) of seeing the rating as 1?. I had counts
>> data for rating (implicit feedback) and for validation dataset I binarized
>> the rating (1 if >0 else 0). My training data has rating positive (it's
>> basically the count of views to a video).
>>
>> I used following to train:
>>
>> * als = ALS(rank=x, maxIter=15, regParam=y,
>> implicitPrefs=True,alpha=40.0)*
>>
>> *                model=als.fit(self.train)*
>>
>> What does negative prediction mean here and is it ok to have that?
>> ᐧ
>>
>>
>>

Mime
View raw message