mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Johannes Schulte <johannes.schu...@gmail.com>
Subject Re: Mix of Content Based and Collaborative Filtering
Date Mon, 05 Nov 2012 20:06:37 GMT
Ted,

do you really mean payloads? Because i consider them part of the index as
they are stored per position and can be accessed during scoring.

How would you then incorporate the similarities in an index. With a faked
term frequency?

I always felt that payloads are a very natural and fast way of storing big
item-to-item relationships with additional content. You dont have to load
everything into memory or use something like a database like you have to do
with the current Mahout DataModel. Instead you have the caching goodness of
the lucene mmap directories without having to worry about heap. At least
we're encountering sub miliseconds response time this way...

Cheers,

Johannes

On Mon, Nov 5, 2012 at 5:05 PM, Ted Dunning <ted.dunning@gmail.com> wrote:

> I think that payloads are a bad idea here.  My rationale is that you really
> want to index these signals if at all possible.
>
> Also, payloads (as of a while ago) were not accessed very efficiently.
>  This can massively slow down scoring.
>
>
> On Mon, Nov 5, 2012 at 7:01 AM, shubham srivastava <shubham.k@gmail.com
> >wrote:
>
> > http://sujitpal.blogspot.in/2011/01/payloads-with-solr.html
> >
> > On Fri, Nov 2, 2012 at 12:13 PM, Johannes Schulte <
> > johannes.schulte@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > i can also encourage to go the simple way with a solr or lucene index.
> It
> > > gives you almost unlimited possibilities when you want include new
> > > "relevance signals" and even more important, have business requirements
> > > like filtering etc.
> > >
> > > I'm using a plain lucene index to combine stuff. The pre-calculated
> > > Item-To-Item similarities are stored as payload fields so the
> > similarities
> > > can be used in the scoring process. This way you can easy issue a query
> > > like "contains x and is similar to items a,b,c".
> > >
> > > You can even use boosting different parts of the query to fade between
> > the
> > > signals. Only question is how much you can achieve "by hand". Probably
> > you
> > > want to somehow learn which weights on the signals perform best. Maybe
> > this
> > > blog article by netflix is a good start
> > >
> > >
> > >
> >
> http://techblog.netflix.com/2012/06/netflix-recommendations-beyond-5-stars.html
> > >
> > >
> > >
> > > Cheers,
> > > Johannes
> > >
> > >
> > > On Fri, Nov 2, 2012 at 6:21 AM, Ted Dunning <ted.dunning@gmail.com>
> > wrote:
> > >
> > > > Speaking with no principles in hand at all, I find that it is
> possible
> > to
> > > > encode multiple item similarity matrices together in a SolR instance
> > and
> > > > then do very nice coordinated recommendations from multiple sources
> of
> > > > information.
> > > >
> > > > Abusing a text retrieval engine this way has only vague basis in
> > theory,
> > > > but it can be particularly nice from a practical point of view.
> > > >
> > > > On Thu, Nov 1, 2012 at 10:41 AM, Sean Owen <srowen@gmail.com> wrote:
> > > >
> > > > > There is not a very direct way to do this in Mahout, but, you can
> > piece
> > > > > together a solution that reuses a lot of what Mahout has.
> > > > >
> > > > > It sounds like you should look at this as an item-item
> > similarity-based
> > > > > recommender to start. You have two sources of similarity. First is
> > > based
> > > > on
> > > > > interactions (no ratings); for this, you can use
> > > LogLikelihoodSimilarity
> > > > > and an existing DataModel. This much is straightforward.
> > > > >
> > > > > You can also make an ItemSimilarity based on item properties. There
> > is
> > > no
> > > > > pre-packaged solution for this. You can make up a similarity
> metric,
> > or
> > > > > export some similarities based on, say, descriptions, maybe from
> Solr
> > > > yes.
> > > > >
> > > > > Then you can combine them. There's no great principled answer. You
> > > could
> > > > > make an ItemSimilarity that just returns the product of these two
> > > > > similarity measures (assuming they are both >= 0).
> > > > >
> > > > > And then the rest is a matter of using GenericItemBasedRecommender
> > with
> > > > > your hybrid ItemSimilarity.
> > > > >
> > > > > This isn't a distributed solution but is a good start.
> > > > >
> > > > > Sean
> > > > >
> > > > >
> > > > > On Thu, Nov 1, 2012 at 5:33 PM, shubham srivastava <
> > > shubham.k@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am looking into designing implementing a recommendation engine
> > >  with
> > > > > the
> > > > > > below use cases . There is no specific rating's etc given by
> user's
> > > as
> > > > > such
> > > > > > for items accessed.
> > > > > >
> > > > > > 1. Item's viewed by other user's who viewed this particular
Item
> > > > > >
> > > > > > 2. Item's booked by other user's who viewed this particular
Item
> > > > > >
> > > > > > 3. Most viewed item('s) viewed by other user's who viewed this
> > > > particular
> > > > > > Item
> > > > > >
> > > > > > The idea behind is the below :
> > > > > >
> > > > > > 1.I want to interpret user behavior where recommendation would
be
> > > based
> > > > > on
> > > > > > the other user's patterns which falls into the bracket of CF(item
> > > based
> > > > > > similarities or user based) .
> > > > > >
> > > > > > 2.I want to exploit item item similarity which is based on N
> number
> > > of
> > > > > > attributes. The attributes can be say :
> > > price,location,features(1...n)
> > > > as
> > > > > > so on.
> > > > > >
> > > > > > The recommendation should be a mix of both of the above.
> > > > > >
> > > > > > A) For 1 given that I don't have an explicit rating my initial
> > > thought
> > > > > was
> > > > > > around interpreting ratings as based on what user does for a
> > product
> > > eg
> > > > > >
> > > > > > If he only views it I give a 1 rating
> > > > > > If he further sees the details I give 2 rating
> > > > > > If he goes to the booking page I give him 3 rating
> > > > > > If he books it I give him 4 rating etc
> > > > > >
> > > > > > And when I have the same I would go for a standard CF item-item
> > > > > similarity
> > > > > > implemented through Mahout
> > > > > >
> > > > > > B) For 2. I was looking into our search framework(Solr) to give
> the
> > > > same
> > > > > > i.e Solr's MoreLikeThis feature. Also carrot also seems to make
> it
> > > > better
> > > > > > but I don't how much would that be scalable etc.
> > > > > >
> > > > > > Idea is to get an intersection if A and B to get started with.
> >  Also
> > > I
> > > > > need
> > > > > > to figure out the processing and latency part of getting the
> > results
> > > as
> > > > > > well.
> > > > > >
> > > > > > I guess the group user's must have solved a similar problem
more
> > > > > > efficiently and could advise better.
> > > > > >
> > > > > > Please let me know the same.
> > > > > >
> > > > > > Regards,
> > > > > > Shubham
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message