mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Fernández <>
Subject Re: churn analysis
Date Thu, 25 Jul 2013 05:19:52 GMT
If you don't know where to start, I would recommend starting with something
more conventional than HMM that can be tricky to fully understand and
explain. A logistic regression model can perform very well if predictors
are built with care. I wouldn't start also with mahout unless this is a
requirement from a client (some clients are so thrilled about "big data"
that they want to use mahout even if it's overkill for most predictive
analytics tasks...), You will probably not need more than 100k-200k records
to build a pretty good model, an undersampling scheme can also be good for
the model (not necessary, but it won't hurt) and lead you that sample size

If you need to go for mahout, there is an SGD implementation for logistic
regression in mahout.

The key point for building a good churn model though is in how you build
predictor variables, then any binary classification model would do the

2013/7/24 <>

> I've not used Mahout to do it, but in the past colleagues have used HMM to
> create a way for discovering customers who are in an "about to churn"
> state, this was used to populate a target list for winback intervention
> (they're about to curn, contact them and offer something - or just help -
> to keep them). I tried the Mahout HMM earlier in the year, but got
> discouraged by some odd behaviour which I have still not managed to delve
> into.
> The problem that we saw with churn analysis for our domain was that most
> churners leave with no event on their account in the recent past.
> Essentially there are external factors that are generating churn over the
> whole population (competitor offers, demographics, economics) which mean
> that the domain model is not accessible from the data. So, while a much
> better than "random" predictor can be built it only barely costed in to
> operate, and is sufficiently far from a conclusive knockdown winner to
> allow homebrew.spreadsheet.witchcraft alternatives to pop up and be given
> air time by people not familiar with the idea that if you flip 1000 coins
> in the air at once some of them are going to keep coming up as heads for a
> bit. One way round this is "more data, better data" which is kinda where I
> came in on for Mahout and HMM's.
> So, my suggestion would be :
> - look at your data; do your churners have events in an actionable period
> (this depends on your domain) that could be the basis of a signal? If there
> are enough of them in this category to power a business case based on
> intervention and win back you're on... if not then more data, better data
> is needed..
> - if there are strong correlations between the last event and the churn?
> Then use a decision tree or similar to classify churn prospects from
> stables - if you get a good predictor no need to do more, if not then..
> - try a HMM, it could help you find groups of sequences of action that
> lead to churning (repeated contacts, escalations, resorting to letter
> writing etc.) But check that Mahouts one is sound and works for you (I am
> not confident that I did enough work to say that my problems weren't a case
> of "problem between screen and chair" so if you get things working then
> superduper!)
> Hope that helps you,
> Simon
> ________________________________________
> From: Sayed Seliman []
> Sent: 24 July 2013 21:37
> To:
> Subject: churn analysis
> Hi,
> what are your experiences in building churn analysis system with mahout ?
> What do you suggest to implement ?
> Any success story implementing churn analysis system with mahout ?
> thanks

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message