Hi Ted,
Thanks that is what I would have thought too but I don't think that the
Pearson Similarity (in Hadoop mode) does this:
in
org.apache.mahout.math.hadoop.similarity.cooccurrence.measures.PearsonCorrelationSimilarity
around line 31
double average = vector.norm(1) / vector.getNumNonZeroElements();
Which looks like it's taking the sum and dividing by the number of defined
elements. Which would make my [5 - 4] average be 4.5.
Thanks again
Amit
On Fri, Nov 29, 2013 at 10:34 PM, Ted Dunning wrote:
> On Fri, Nov 29, 2013 at 10:16 PM, Amit Nithian wrote:
>
> > Hi Ted,
> >
> > Thanks for your response. I thought that the mean of a sparse vector is
> > simply the mean of the "defined" elements? Why would the vectors become
> > dense unless you're meaning that all the undefined elements (0?) now will
> > be (0-m_x)?
> >
>
> Yes. Just so. All those zero elements become non-zero and the vector is
> thus non-dense.
>
>
> >
> > Looking at the following example:
> > X = [5 - 4] and Y= [4 5 2].
> >
> > is m_x 4.5 or 3?
>
>
> 3.
>
> This is because the elements of X are really 5, 0, and 4. The zero is just
> not stored, but it still is the value of that element.
>
>
> > Is m_y 11/3 or (6/2) because we ignore the "5" since it's
> > counterpart in X is undefined?.
> >
>
> 11/3
>