mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Mahout performance issues
Date Fri, 02 Dec 2011 18:13:21 GMT
On Fri, Dec 2, 2011 at 10:09 AM, Daniel Zohar <dissoman@gmail.com> wrote:

> And how do you purpose to kill these items? I mean, we should still keep
> all the user-item associations, shouldn't we?
>

No.  These items bring no information whatsoever.


> If it's that popular, how would we recommend items for users which had
> interacted only with that item alone?
>

Two answers:

- it will appear on the most popular page.  Recommendations are for telling
people what they might be interested in that is *different* from the
population at large.

- if a person has only interacted with a single item that is vastly
popular, we have no useful information about this person.  Indeed, the
result of the recommendations will be almost the same as the most popular
page.  Better to admit that we know nothing here and move on.


>
> On Fri, Dec 2, 2011 at 8:07 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>
> > Since touching them adds nothing but cost, then not touching them is
> > better.  Kill the item!
> >
> > In practical terms, we had this problem at Veoh.  Everybody got the same
> > intro video.  It provided no information.  Likewise at Musicmatch,
> > everybody got the same startup noise during the splash screen.  It added
> no
> > information.  Both of these cases would kill performance in lots of
> > recommendation engines because a vast number of users would get sucked
> into
> > computations where it made no difference at all.
> >
> > Better to kill these items.
> >
> > On Fri, Dec 2, 2011 at 10:03 AM, Sean Owen <srowen@gmail.com> wrote:
> >
> > > Yes, but those users will bring no more candidate items to consider,
> and
> > > the apparent bottleneck is not touching those users, but later
> computing
> > > all those similarities. That's my argument.
> > >
> > > On Fri, Dec 2, 2011 at 5:56 PM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> > > >
> > > > Actually, if these users single item is a fantastically popular item,
> > > then
> > > > all of those users will be roped into the computation (with no
> effect).
> > > >
> > > > Sean's argument would be correct if the users were each interacting
> > with
> > > > some item that is way out in the low frequency tail.  By Murphy, this
> > > won't
> > > > be the case.
> > > >
> > > > Better to dump the uninformative items using a kill list.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message