Yeah, I think the idea of confidence is a bit different than what I am looking for using implicit factorization to do document clustering.

The broadcasted value of gram matrix w_i'wi or h_j'h_j will also count the r_ij those are observed...So I might be fine using the broadcasted gram matrix and use the linear term as \sum (-r_ijw_i) or \sum (-rijh_j)...

I will think further but in the current implicit formulation with confidence, looks like I am really factorizing a 0/1 matrix with weights 1 + alpha*rating for . It's a bit different from LSA model.

On Sun, Jul 26, 2015 at 12:34 AM, Sean Owen <sowen@cloudera.com> wrote:

confidence = 1 + alpha * |rating| here (so, c1 means confidence - 1),

so alpha = 1 doesn't specially mean high confidence. The loss function

is computed over the whole input matrix, including all missing "0"

entries. These have a minimal confidence of 1 according to this

formula. alpha controls how much more confident you are in what the

entries that do exist in the input mean. So alpha = 1 is low-ish and

means you don't think the existence of ratings means a lot more than

their absence.

I think the explicit case is similar, but not identical -- here. The

cost function for the explicit case is not the same, which is the more

substantial difference between the two. There, ratings aren't inputs

to a confidence value that becomes a weight in the loss function,

during this factorization of a 0/1 matrix. Instead the rating matrix

is the thing being factorized directly.

On Sun, Jul 26, 2015 at 6:45 AM, Debasish Das <debasish.das83@gmail.com> wrote:

> Hi,

>

> Implicit factorization is important for us since it drives recommendation

> when modeling user click/no-click and also topic modeling to handle 0 counts

> in document x word matrices through NMF and Sparse Coding.

>

> I am a bit confused on this code:

>

> val c1 = alpha * math.abs(rating)

> if (rating > 0) ls.add(srcFactor, (c1 + 1.0)/c1, c1)

>

> When the alpha = 1.0 (high confidence) and rating is > 0 (true for word

> counts), why this formula does not become same as explicit formula:

>

> ls.add(srcFactor, rating, 1.0)

>

> For modeling document, I believe implicit Y'Y needs to stay but we need

> explicit ls.add(srcFactor, rating, 1.0)

>

> I am understanding confidence code further. Please let me know if the idea

> of mapping implicit to handle 0 counts in document word matrix makes sense.

>

> Thanks.

> Deb

>