I created a JIRA for this:
https://issues.apache.org/jira/browse/SPARK6637. Since we don't have
a clear answer about how the scaling should be handled. Maybe the best
solution for now is to switch back to the 1.2 scaling. Xiangrui
On Tue, Mar 31, 2015 at 2:50 PM, Sean Owen <sowen@cloudera.com> wrote:
> Ah yeah I take your point. The squared error term is over the whole
> useritem matrix, technically, in the implicit case. I suppose I am
> used to assuming that the 0 terms in this matrix are weighted so much
> less (because alpha is usually largeish) that they're almost not
> there, but they are. So I had just used the explicit formulation.
>
> I suppose the result is kind of scale invariant, but not exactly. I
> had not prioritized this property since I had generally built models
> on the full data set and not a sample, and had assumed that lambda
> would need to be retuned over time as the input grew anyway.
>
> So, basically I don't know anything more than you do, sorry!
>
> On Tue, Mar 31, 2015 at 10:41 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
>> Hey Sean,
>>
>> That is true for explicit model, but not for implicit. The ALSWR
>> paper doesn't cover the implicit model. In implicit formulation, a
>> subproblem (for v_j) is:
>>
>> min_{v_j} \sum_i c_ij (p_ij  u_i^T v_j)^2 + lambda * X * \v_j\_2^2
>>
>> This is a sum for all i but not just the users who rate item j. In
>> this case, if we set X=m_j, the number of observed ratings for item j,
>> it is not really scale invariant. We have #users user vectors in the
>> least squares problem but only penalize lambda * #ratings. I was
>> suggesting using lambda * m directly for implicit model to match the
>> number of vectors in the least squares problem. Well, this is my
>> theory. I don't find any public work about it.
>>
>> Best,
>> Xiangrui
>>
>> On Tue, Mar 31, 2015 at 5:17 AM, Sean Owen <sowen@cloudera.com> wrote:
>>> I had always understood the formulation to be the first option you
>>> describe. Lambda is scaled by the number of items the user has rated /
>>> interacted with. I think the goal is to avoid fitting the tastes of
>>> prolific users disproportionately just because they have many ratings
>>> to fit. This is what's described in the ALSWR paper we link to on the
>>> Spark web site, in equation 5
>>> (http://www.grappa.univlille3.fr/~mary/cours/stats/centrale/reco/paper/MatrixFactorizationALS.pdf)
>>>
>>> I think this also gets you the scaleinvariance? For every additional
>>> rating from user i to product j, you add one new term to the
>>> squarederror sum, (r_ij  u_i . m_j)^2, but also, you'd increase the
>>> regularization term by lambda * (u_i^2 + m_j^2) They are at least
>>> both increasing about linearly as ratings increase. If the
>>> regularization term is multiplied by the total number of users and
>>> products in the model, then it's fixed.
>>>
>>> I might misunderstand you and/or be speaking about something slightly
>>> different when it comes to invariance. But FWIW I had always
>>> understood the regularization to be multiplied by the number of
>>> explicit ratings.
>>>
>>> On Mon, Mar 30, 2015 at 5:51 PM, Xiangrui Meng <mengxr@gmail.com> wrote:
>>>> Okay, I didn't realize that I changed the behavior of lambda in 1.3.
>>>> to make it "scaleinvariant", but it is worth discussing whether this
>>>> is a good change. In 1.2, we multiply lambda by the number ratings in
>>>> each subproblem. This makes it "scaleinvariant" for explicit
>>>> feedback. However, in implicit feedback model, a user's subproblem
>>>> contains all item factors. Then the question is whether we should
>>>> multiply lambda by the number of explicit ratings from this user or by
>>>> the total number of items. We used the former in 1.2 but changed to
>>>> the latter in 1.3. So you should try a smaller lambda to get a similar
>>>> result in 1.3.
>>>>
>>>> Sean and Shuo, which approach do you prefer? Do you know any existing
>>>> work discussing this?
>>>>
>>>> Best,
>>>> Xiangrui
>>>>
>>>>
>>>> On Fri, Mar 27, 2015 at 11:27 AM, Xiangrui Meng <mengxr@gmail.com>
wrote:
>>>>> This sounds like a bug ... Did you try a different lambda? It would be
>>>>> great if you can share your dataset or reproduce this issue on the
>>>>> public dataset. Thanks! Xiangrui
>>>>>
>>>>> On Thu, Mar 26, 2015 at 7:56 AM, Ravi Mody <rmody999@gmail.com>
wrote:
>>>>>> After upgrading to 1.3.0, ALS.trainImplicit() has been returning
vastly
>>>>>> smaller factors (and hence scores). For example, the first few product's
>>>>>> factor values in 1.2.0 are (0.04821, 0.00674, 0.0325). In 1.3.0,
the
>>>>>> first few factor values are (2.535456E8, 1.690301E8, 6.99245E8).
This
>>>>>> difference of several orders of magnitude is consistent throughout
both user
>>>>>> and product. The recommendations from 1.2.0 are subjectively much
better
>>>>>> than in 1.3.0. 1.3.0 trains significantly faster than 1.2.0, and
uses less
>>>>>> memory.
>>>>>>
>>>>>> My first thought is that there is too much regularization in the
1.3.0
>>>>>> results, but I'm using the same lambda parameter value. This is a
snippet of
>>>>>> my scala code:
>>>>>> .....
>>>>>> val rank = 75
>>>>>> val numIterations = 15
>>>>>> val alpha = 10
>>>>>> val lambda = 0.01
>>>>>> val model = ALS.trainImplicit(train_data, rank, numIterations,
>>>>>> lambda=lambda, alpha=alpha)
>>>>>> .....
>>>>>>
>>>>>> The code and input data are identical across both versions. Did anything
>>>>>> change between the two versions I'm not aware of? I'd appreciate
any help!
>>>>>>

To unsubscribe, email: userunsubscribe@spark.apache.org
For additional commands, email: userhelp@spark.apache.org
