mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Lyubimov <>
Subject Re: Some test results
Date Wed, 30 Dec 2015 19:57:59 GMT
On Dec 30, 2015 11:51 AM, "Pat Ferrel" <> wrote:

> As many of you know Mahout-Samsara includes an interesting and important
> extension to cooccurrence similarity, which supports cross-coossurrence and
> log-likelihood downsampling. This, when combined with a search engine,
> gives us a multimodal recommender. Some of us integrated Mahout with a DB
> and search engine to create what we call (humbly) the Universal Recommender.
> We just completed a tool that measures the effects of what we call
> secondary events or indicators using the Universal Recommender. It
> calculates a ranking based precision metric called mean average
> precision—MAP@k. We took a dataset from the Rotten Tomatoes web site of
> “fresh”, and “rotten” reviews and combined that with data about the genres,
> casts, directors, and writers of the various video items. This gave us the
> indicators below:
> like, video-id <== primary indicator
> dislike, video-id
> like-genre, genre-id
> dislike-genre, genre-id
> like-director, director-id
> dislike-director, director-id
> like-writer, writer-id
> dislike-writer, writer-id
> like-cast, cast-member-id
> dislike-cast, cast-member-id
> These aren’t necessarily what we would have chosen if we were designing
> something from scratch but are possible to gather from public data.
> We have only ~5000 mostly professional reviewers with ~250k video items in
> this dataset but have a larger one we are integrating. We are also writing
> a white paper and blog post with some deeper analysis. There are several
> tidbits of insight when you look deeper.
> The bottom line is that using most of the above indicators we were able to
> get a 26% increase in MAP@1 over using only “like”. This is important
> because the vast majority of recommenders can only really ingest one type
> of indicator.
> <
> <
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message