mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: Number of features for ALS
Date Sun, 06 Apr 2014 17:37:00 GMT
> On Apr 6, 2014, at 2:48 AM, Niklas Ekvall <> wrote:
> Hi Pat and Ted!
> Yes I agree with about the rank and MAP. But in this case, that is a good
> initial guess on the parameters *number of features* and *lambda*?

20 or 30 features depending on the variance in your data, more is theoretically better but
usually give rapidly diminishing returns. I forget what lambdas we tried

> Where can I find the best article about cooccurrence recommender? And can
> one use this approach for different types of data, e.g., ratings, purchase
> histories or click histories?

Absolutely, but remember that the data you train on is what you are recommending. So if you
train on detail-views (click paths) the recommender will return items to look at, not necessarily
the same as items to purchase. If you train on what you want to recommend then all of the
above will work.

If you want to train on click-paths and recommend purchase you probably need a cross-recommender
another discussion altogether.

> Best, Niklas
> 2014-03-31 7:53 GMT+02:00 Ted Dunning <>:
>> Yeah... what Pat said.
>> Off-line evaluations are difficult.  At most, they provide directional
>> guidance to be refined using live A/B testing.  Of course, A/B testing of
>> recommenders comes with a new set of tricky issues like different
>> recommenders learning from each other.
>> On Sun, Mar 30, 2014 at 4:54 PM, Pat Ferrel <> wrote:
>>> Seems like most people agree that ranking is more important than rating
>> in
>>> most recommender deployments. RMSE was used for a long time with
>>> cross-validation (partly because it was the choice of Netflix during the
>>> competition) but it is really a measure of total rating error.  In the
>> past
>>> we've used mean-average-precision as a good measure of ranking quality.
>> We
>>> chose hold-out tests based on time, so something like 10% of the most
>>> recent data was held out for cross-validaton and we measured MAP@n for
>>> tuning parameters.
>>> For our data (ecommerce shopping data) most of the ALS tuning parameters
>>> had very little affect on MAP. However cooccurrence recommenders
>> performed
>>> much better using the same data. Unfortunately comparing two algorithms
>>> with offline tests is of questionable value. Still with nothing else to
>> go
>>> on we went with the cooccurrence recommender.

View raw message