mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Pentreath" <nick.pentre...@gmail.com>
Subject Re: Tweaking ALS models to filter out "highly related" items when an item has been purchased
Date Thu, 05 Sep 2013 19:26:22 GMT
Thanks for the comments - all useful. Seems as always a bit of experimentation is in order
to try the view-vs-purchase filtering, vs heuristic post reordering, vs potentially some metadata-based
approach.

    
      
        


      One of our challenges is we are indeed trying to generalise as much as possible since
we have a "recommender as a a service" type offering. So catering to edge cases is indeed
not the way to go. But potentially a heuristic-style approach that can be somewhat learned
from data/recommender performance, vua split testing and offline testing, might be the way
to go.

      
        


    

    —
Sent from Mailbox for iPhone

On Thu, Sep 5, 2013 at 8:53 PM, Dmitriy Lyubimov <dlieu.7@gmail.com>
wrote:

> FWIW our marketing people call it "cross-sell" and "upsell". i.e.
> selling stuff from different categories vs. offering more behaviorally
>  similar items to currently browsed category optimized to speicifc
> target (revenue,sales event etc.) in either case, preexisting (or
> inferred from side data via clustering) labelling helps to discern
> between "upsell" and "cross-sell" scores.
> On Thu, Sep 5, 2013 at 11:22 AM, Dominik Hübner <contact@dhuebner.com> wrote:
>>> As far as implementation is concerned, I think that it is very important to
>>> not distort the basic recommendation algorithm with business rules like
>>> this.  It is much better to post-process the results to impose your will
>>> directly.  One exception to this is that I think it is reasonable to use
>>> ordered cooccurrence and also repeated cooccurrence here for some hints
>>> here.  This lets you determine likely accessories (purchased after the main
>>> item, mostly) and also find razor-blades (highly repetitive purchases).
>>> You still have the problem of flooding with similar items.
>>
>> +1 for keeping business rules out of your recommendations. I think integrating too
many edge cases will never generalize for all users and debugging becomes nothing but a pain.
>>
>>> My approach in the past was to define heuristic definitions for "too
>>> similar" and do a pass over the sorted recommendation results giving each
>>> item that passes the too-similar criterion a penalty score.  When done with
>>> this, I re-sort the results and the duplicative content falls to the bottom
>>> of the recommendations.
>>>
>>
>> I recently was working on some recommendations for a fashion brand. Filtering too
similar items was indeed crucial. I observed a common pattern of users viewing products only
varying in their color or other "minor" features. I think it ultimately depends on the environment
you are displaying your recommendations. If you actually try to show related products, those
really similar items (like color variations) might not be the worst thing. Building some sort
of product mash-up probably should be more diverse, just like Ted mentioned with flooding
the first few pages. But …. there they are again, those edge-cases I mentioned. Pre-sale
recommendations might be less diverse than after purchase recommendations. I just depends
on the domain you are working in I guess.
>>
>> On Sep 5, 2013, at 7:38 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
>>
>>> I think that Dominik's comments are exactly on target.
>>>
>>> As far as implementation is concerned, I think that it is very important to
>>> not distort the basic recommendation algorithm with business rules like
>>> this.  It is much better to post-process the results to impose your will
>>> directly.  One exception to this is that I think it is reasonable to use
>>> ordered cooccurrence and also repeated cooccurrence here for some hints
>>> here.  This lets you determine likely accessories (purchased after the main
>>> item, mostly) and also find razor-blades (highly repetitive purchases).
>>> You still have the problem of flooding with similar items.
>>>
>>> The diversity that you are talking about is a critical quality in
>>> recommendation results.  The basic intuition is that recommendation results
>>> are not individual recommendations, but are included in a portfolio of
>>> recommendations.  You need the diversity in this portfolio because if you
>>> are wrong about an item, the likelihood of being wrong about very similar
>>> items is high.  If you flood the first and second pages with these similar
>>> items, then you don't have room for the alternative items that might well
>>> be correct.
>>>
>>> My approach in the past was to define heuristic definitions for "too
>>> similar" and do a pass over the sorted recommendation results giving each
>>> item that passes the too-similar criterion a penalty score.  When done with
>>> this, I re-sort the results and the duplicative content falls to the bottom
>>> of the recommendations.
>>>
>>>
>>>
>>> On Thu, Sep 5, 2013 at 1:15 AM, Dominik Hübner <contact@dhuebner.com>
wrote:
>>>
>>>> Just a quick a assumption, maybe I have not thought this through enough:
>>>>
>>>> 1. Users probably tend to compare products => similar VIEWS
>>>> 2. User as well might tend to PURCHASE accessory products, like the laptop
>>>> bag you mentioned
>>>>
>>>> May be you could filter out products that have a similarity computed from
>>>> the product views, but leave those similar, based on purchases, in your
>>>> recommendation set?
>>>>
>>>> Nevertheless, I guess this will be strongly depending on the domain the
>>>> data comes from.
>>>>
>>>>
>>>> On Sep 5, 2013, at 10:07 AM, Nick Pentreath <nick.pentreath@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all
>>>>>
>>>>> Say I have a set of ecommerce data (views, purchases etc). I've built
my
>>>>> model using implicit feedback ALS. Now, I want to add a little bit of
>>>>> "smart filtering".
>>>>>
>>>>> Filtering based on not recommending something that has been purchased
is
>>>>> straightforward, but I'd like to also filter so as not to recommend
>>>> "highly
>>>>> similar" items to someone who has purchased an item.
>>>>>
>>>>> In other words, if someone has just purchased a laptop, then I'd like
to
>>>>> not recommend other laptops. Ideally while still recommending "related"
>>>>> items such as laptop bags, mouse etc etc. (this is just an example).
>>>>>
>>>>> Now, I could filter based on metadata tags like "category", but assuming
>>>> I
>>>>> don't always have that data, then simplistically I have the option of
>>>>> filtering out products based on those that have high cosine similarity
to
>>>>> the purchased products. However, this risks filtering out "good" similar
>>>>> products (like the laptop bags) as well as the "bad" similar products.
>>>>>
>>>>> I'm experimenting with building a second variant of the model that
>>>>> effectively downweights "views" to near zero, hence leaving something
>>>> sort
>>>>> of like a "purchased together" model variant. Then recommendations can
be
>>>>> made using this model when a user purchases an item (or perhaps a
>>>> re-scorer
>>>>> that is a weighted variant of model A and model B but that tends to
>>>> weight
>>>>> model B - the purchased together model - higher)
>>>>>
>>>>> Are there other mechanisms to tweak the ALS model such that it tends
>>>>> towards recommending "related products" (but not "highly similar of the
>>>>> exact same narrow product type")?
>>>>>
>>>>> Any other ideas about how best to go about this?
>>>>>
>>>>> Many thanks
>>>>> Nick
>>>>
>>>>
>>
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message