Hi Ted,
Thanks for your response. I thought that the mean of a sparse vector is
simply the mean of the "defined" elements? Why would the vectors become
dense unless you're meaning that all the undefined elements (0?) now will
be (0m_x)?
Looking at the following example:
X = [5  4] and Y= [4 5 2].
is m_x 4.5 or 3? Is m_y 11/3 or (6/2) because we ignore the "5" since it's
counterpart in X is undefined?.
Thanks again
Amit
On Fri, Nov 29, 2013 at 9:58 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> Well, the best way to compute correlation using sparse vectors is to make
> sure you keep them sparse. To do that, you must avoid subtracting the mean
> by expanding whatever formulae you are using. For instance, if you are
> computing
>
> (x  m_x) . (y  m_y)
>
> (here . means dot product)
>
> If you do this directly, then you lose all benefit of sparse vectors since
> subtracting the means makes each vector dense.
>
> What you should compute instead is this alternative form
>
> x . y  m_x e . y  m_y e . x + m_x m_y
>
> (here e represents a vector full of 1's)
>
> The dot product here is sparse and the expression m_x e . y can be computed
> (at lease in Mahout) in mapreduce idiom as
>
> y.aggregate(Functions.PLUS, Functions.mult(m_x))
>
>
>
>
> On Fri, Nov 29, 2013 at 9:31 PM, Amit Nithian <anithian@gmail.com> wrote:
>
> > Okay so I rethought my question and realized that the paper never really
> > talked about collaborative filtering but just how to calculate itemitem
> > similarity in a scalable fashion. Perhaps this is the reason for why the
> > common ratings aren't used? Because that's not a prereq for this
> > calculation?
> >
> > Although for my own clarity, I'd still like to get a better understanding
> > of what it means to calculate the correlation between sparse vectors
> where
> > you're normalizing each vector using a separate denominator.
> >
> > P.S. If my question(s) don't make sense please let me know for it's very
> > possible I am completely misunderstanding something :).
> >
> > Thanks again!
> > Amit
> >
> >
> > On Wed, Nov 27, 2013 at 8:23 AM, Amit Nithian <anithian@gmail.com>
> wrote:
> >
> > > Hey Sebastian,
> > >
> > > Thanks again. Actually I'm glad that I am talking to you as it's your
> > > paper and presentation I have questions with! :)
> > >
> > > So to clarify my question further, looking at this presentation (
> > > http://isabeldrost.de/hadoop/slides/collabMahout.pdf) you have the
> > > following user x item matrix:
> > > M A I
> > > A 5 1 4
> > > B  2 5
> > > P 4 3 2
> > >
> > > If I want to calculate the pearson correlation between Matrix and
> > > Inception, I'd have the rating vectors:
> > > [5  4] vs [4 5 2].
> > >
> > > One of the steps in your paper is the normalization step which
> subtracts
> > > the mean item rating from each value and essentially do the L2Norm of
> > this
> > > resulting vector (or in other words, the L2 norm of the meancentered
> > > vector ?)
> > >
> > > The question I have had is what is the average rating for Matrix and
> > > Inception? I can see the following:
> > > Matrix  4.5 (9/2), Inception  3 (6/2) because you only consider
> shared
> > > ratings
> > > Matrix  3 (9/3), Inception  3.667 (11/3) assuming that the missing
> > > rating is 0
> > > Matrix  4.5 (9/2), Inception  3.667 (11/3) subtract from the average
> of
> > > all nonzero ratings ==> This is what I believe the current
> > implementation
> > > does.
> > >
> > > Unfortunately, neither of these yield the 0.47 listed in the
> presentation
> > > but that's a separate issue. In my testing, I see that Mahout Taste
> > > (nondistributed) uses the 1st approach while the distributed approach
> > uses
> > > the 3rd approach.
> > >
> > > I am okay with #3; however I just want to understand that this is the
> > case
> > > and that it's okay. This is why I was asking about pearson correlation
> > > between vectors of "different" lengths because the average rating is
> > being
> > > computed using a denominator (number of users) that is different
> between
> > > the two (2 vs 3).
> > >
> > > I know you said in practice that people don't use Pearson to compute
> > > inferred ratings but this is just for my complete understanding (and
> > since
> > > it's the example used in your presentation). This same question applies
> > to
> > > cosine as you are doing an L2Norm of the vector as a preprocessing
> step
> > > and including/excluding nonshared ratings may make a difference.
> > >
> > > Thanks again!
> > > Amit
> > >
> > >
> > > On Wed, Nov 27, 2013 at 7:13 AM, Sebastian Schelter <
> > > ssc.open@googlemail.com> wrote:
> > >
> > >> Hi Amit,
> > >>
> > >> Yes, it gives different results. However in practice, most people
> don't
> > >> do rating prediction with Pearson coefficient, but use countbased
> > >> measures like the loglikelihood ratio test.
> > >>
> > >> The distributed code doesn't look at vectors of different lengths, but
> > >> simply assumes nonexistent ratings as zero.
> > >>
> > >> sebastian
> > >>
> > >> On 27.11.2013 16:09, Amit Nithian wrote:
> > >> > Comparing this against the non distributed (taste) gives different
> > >> answers
> > >> > for item item similarity as of course the non distributed looks only
> > at
> > >> > corated items. I was more wondering if this difference in practice
> > >> mattered
> > >> > or not.
> > >> >
> > >> > Also I'm confused on how you can compute the Pearson similarity
> > between
> > >> two
> > >> > vectors of different length which essentially is going on here I
> > think?
> > >> >
> > >> > Thanks again
> > >> > Amit
> > >> > On Nov 27, 2013 9:06 AM, "Sebastian Schelter" <
> > ssc.open@googlemail.com>
> > >> > wrote:
> > >> >
> > >> >> Yes, it is due to the parallel algorithm which only looks at
> > coratings
> > >> >> from a given user.
> > >> >>
> > >> >>
> > >> >> On 27.11.2013 15:02, Amit Nithian wrote:
> > >> >>> Thanks Sebastian! Is there a particular reason for that?
> > >> >>> On Nov 27, 2013 7:47 AM, "Sebastian Schelter" <
> > >> ssc.open@googlemail.com>
> > >> >>> wrote:
> > >> >>>
> > >> >>>> Hi Amit,
> > >> >>>>
> > >> >>>> You are right, the noncorated items are not filtered
out in the
> > >> >>>> distributed implementation.
> > >> >>>>
> > >> >>>> sebastian
> > >> >>>>
> > >> >>>>
> > >> >>>> On 26.11.2013 20:51, Amit Nithian wrote:
> > >> >>>>> Hi all,
> > >> >>>>>
> > >> >>>>> Apologies if this is a repeat question as I just joined
the list
> > >> but I
> > >> >>>> have
> > >> >>>>> a question about the way that metrics like Cosine
and Pearson
> are
> > >> >>>>> calculated in Hadoop "mode" (i.e. non Taste).
> > >> >>>>>
> > >> >>>>> As far as I understand, the vectors used for computing
pairwise
> > item
> > >> >>>>> similarity in Taste are based on the corated items;
however, in
> > the
> > >> >>>> Hadoop
> > >> >>>>> implementation, I don't see this done.
> > >> >>>>>
> > >> >>>>> The implementation of the distributed itemitem similarity
comes
> > >> from
> > >> >>>> this
> > >> >>>>> paper
> http://ssc.io/wpcontent/uploads/2012/06/rec11schelter.pdf
> > .
> > >> I
> > >> >>>> didn't
> > >> >>>>> see anything in this paper about filtering out those
elements
> from
> > >> the
> > >> >>>>> vectors not corated and this can make a difference
especially
> > when
> > >> you
> > >> >>>>> normalize the ratings by dividing by the average item
rating. In
> > >> some
> > >> >>>>> cases, the # users to divide by can be fewer depending
on the
> > >> >> sparseness
> > >> >>>> of
> > >> >>>>> the vector.
> > >> >>>>>
> > >> >>>>> Any clarity on this would be helpful.
> > >> >>>>>
> > >> >>>>> Thanks!
> > >> >>>>> Amit
> > >> >>>>>
> > >> >>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >>
> > >> >
> > >>
> > >>
> > >
> >
>
