The problem with the usual preference count is that big hit items can be overwhelmingly popular.
If you want to know which ones the most people saw and are likely to have an opinion about
then this seems a good measure. But these hugely popular items may not differentiate taste.
So we calculate the “important” taste indicators with LLR. The benefit of the similarity
matrix is that it attempts to model the “important” cooccurrences.
There is an affect of hugely popular items where they really say nothing about similarity
of taste. Everyone likes motherhood and Apple pie so it doesn’t say much about us if we
both do to. This is usually accounted for with something like TFIDF so I suppose another weighted
popularity measure would be to run the preference matrix through TFIDF to deweight nondifferentiating
preferences.
On Feb 6, 2014, at 7:14 AM, Ted Dunning <ted.dunning@gmail.com> wrote:
If you look at the indicator matrix (cooccurrence reduced by LLR), you will
usually have asymmetry due to limitations on the number of indicators per
row.
This will give you some interesting results when you look at the column
sums. I wouldn't call it popularity, but it is an interesting measure.
On Thu, Feb 6, 2014 at 2:15 PM, Sean Owen <srowen@gmail.com> wrote:
> I have always defined popularity as just the number of ratings/prefs,
> yes. You could rank on some kind of 'net promoter score'  good
> ratings minus bad ratings  though that becomes more like 'most
> liked'.
>
> How do you get popularity from similarity  similarity to what?
> Ranking by sum of similarities seems more like a measure of how much
> the item is the 'centroid' of all items. Not necessarily most popular
> but 'least eccentric'.
>
>
> On Thu, Feb 6, 2014 at 7:41 AM, Tevfik Aytekin <tevfik.aytekin@gmail.com>
> wrote:
>> Well, I think what you are suggesting is to define popularity as being
>> similar to other items. So in this way most popular items will be
>> those which are most similar to all other items, like the centroids in
>> Kmeans.
>>
>> I would first check the correlation between this definition and the
>> standard one (that is, the definition of popularity as having the
>> highest number of ratings). But my intuition is that they are
>> different things. For example. an item might lie at the center in the
>> similarity space but it might not be a popular item. However, there
>> might still be some correlation, it would be interesting to check it.
>>
>> hope it helps
>>
>>
>>
>>
>> On Wed, Feb 5, 2014 at 3:27 AM, Pat Ferrel <pat@occamsmachete.com>
> wrote:
>>> Trying to come up with a relative measure of popularity for items in a
> recommender. Something that could be used to rank items.
>>>
>>> The user  item preference matrix would be the obvious thought. Just
> add the number of preferences per item. Maybe transpose the preference
> matrix (the temp DRM created by the recommender), then for each row vector
> (now that a row = item) grab the number of non zero preferences. This
> corresponds to the number of preferences, and would give one measure of
> popularity. In the case where the items are not boolean you'd sum the
> weights.
>>>
>>> However it might be a better idea to look at the itemitem similarity
> matrix. It doesn't need to be transposed and contains the "important"
> similaritiesas calculated by LLR for example. Here similarity means
> similarity in which users preferred an item. So summing the nonzero
> weights would give perhaps an even better relative "popularity" measure.
> For the same reason clustering the similarity matrix would yield
> "important" clusters.
>>>
>>> Anyone have intuition about this?
>>>
>>> I started to think about this because transposing the useritem matrix
> seems to yield a fromat that cannot be sent directly into clustering.
>
