spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debasish Das <debasish.da...@gmail.com>
Subject Re: Market Basket Analysis
Date Fri, 05 Dec 2014 23:35:29 GMT
Apriori can be thought as a post-processing on product similarity graph...I
call it product similarity but for each product you build a node which
keeps distinct users visiting the product and two product nodes are
connected by an edge if the intersection > 0...you are assuming if no one
user visits a keyword, he is not going to visit it in the future...this
graph is not for prediction but only keeps user visits...

Anyway once you have build this graph on graphx, you can do interesting
path based analysis...Pick a product and trace it's fanout to see once
people bought this product, which other product they bought etc etc..A
first stab at the analysis is to calculate the product similarities...

You can also generate naturally occurring cluster of products but then you
are partitioning the graph using spectral or other graph partitioners like
METIS...Even the adhoc analysis of product graph will give lot of useful
insights (hopefully deeper than apriori)...

On Fri, Dec 5, 2014 at 12:25 PM, Sean Owen <sowen@cloudera.com> wrote:

> I doubt Amazon uses a priori for this, but who knows. Usually you want
> "also bought" functionality, which is a form of similar-item
> computation. But you don't want to favor items that are simply
> frequently purchased in general.
>
> You probably want to look at pairs of items that co-occur in purchase
> histories unusually frequently by looking at (log) likelihood ratios,
> which is a straightforward item similarity computation.
>
> On Fri, Dec 5, 2014 at 11:43 AM, Ashic Mahtab <ashic@live.com> wrote:
> > This can definitely be useful. "Frequently bought together" is something
> > amazon does, though surprisingly, you don't get a discount. Perhaps it
> can
> > lead to offering (or avoiding!) deals on frequent itemsets.
> >
> > This is a good resource for frequent itemsets implementations:
> > http://infolab.stanford.edu/~ullman/mmds/ch6.pdf
> >
> > ________________________________
> > From: rpujari@hortonworks.com
> > Date: Fri, 5 Dec 2014 10:31:17 -0600
> > Subject: Re: Market Basket Analysis
> > To: sowen@cloudera.com
> > CC: tgp@preferred.jp; user@spark.apache.org
> >
> >
> > This is a typical use case "people who buy electric razors, also tend to
> buy
> > batteries and shaving gel along with it". The goal is to build a model
> which
> > will look through POS records and find which product categories have
> higher
> > likelihood of appearing together in given a transaction.
> >
> > What would you recommend?
> >
> > On Fri, Dec 5, 2014 at 7:21 AM, Sean Owen <sowen@cloudera.com> wrote:
> >
> > Generally I don't think frequent-item-set algorithms are that useful.
> > They're simple and not probabilistic; they don't tell you what sets
> > occurred unusually frequently. Usually people ask for frequent item
> > set algos when they really mean they want to compute item similarity
> > or make recommendations. What's your use case?
> >
> > On Thu, Dec 4, 2014 at 8:23 PM, Rohit Pujari <rpujari@hortonworks.com>
> > wrote:
> >> Sure, I’m looking to perform frequent item set analysis on POS data set.
> >> Apriori is a classic algorithm used for such tasks. Since Apriori
> >> implementation is not part of MLLib yet, (see
> >> https://issues.apache.org/jira/browse/SPARK-4001) What are some other
> >> options/algorithms I could use to perform a similar task? If there’s no
> >> spoon to spoon substitute,  spoon to fork will suffice too.
> >>
> >> Hopefully this provides some clarification.
> >>
> >> Thanks,
> >> Rohit
> >>
> >>
> >>
> >> From: Tobias Pfeiffer <tgp@preferred.jp>
> >> Date: Thursday, December 4, 2014 at 7:20 PM
> >> To: Rohit Pujari <rpujari@hortonworks.com>
> >> Cc: "user@spark.apache.org" <user@spark.apache.org>
> >> Subject: Re: Market Basket Analysis
> >>
> >> Hi,
> >>
> >> On Thu, Dec 4, 2014 at 11:58 PM, Rohit Pujari <rpujari@hortonworks.com>
> >> wrote:
> >>>
> >>> I'd like to do market basket analysis using spark, what're my options?
> >>
> >>
> >> To do it or not to do it ;-)
> >>
> >> Seriously, could you elaborate a bit on what you want to know?
> >>
> >> Tobias
> >>
> >>
> >>
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or entity
> >> to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of
> >> this message is not the intended recipient, you are hereby notified that
> >> any
> >> printing, copying, dissemination, distribution, disclosure or forwarding
> >> of
> >> this communication is strictly prohibited. If you have received this
> >> communication in error, please contact the sender immediately and delete
> >> it
> >> from your system. Thank You.
> >
> >
> >
> >
> > --
> > Rohit Pujari
> > Solutions Engineer, Hortonworks
> > rpujari@hortonworks.com
> > 716-430-6899
> >
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the
> reader of
> > this message is not the intended recipient, you are hereby notified that
> any
> > printing, copying, dissemination, distribution, disclosure or forwarding
> of
> > this communication is strictly prohibited. If you have received this
> > communication in error, please contact the sender immediately and delete
> it
> > from your system. Thank You.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message