mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Some test results
Date Wed, 30 Dec 2015 19:50:46 GMT
As many of you know Mahout-Samsara includes an interesting and important extension to cooccurrence
similarity, which supports cross-coossurrence and log-likelihood downsampling. This, when
combined with a search engine, gives us a multimodal recommender. Some of us integrated Mahout
with a DB and search engine to create what we call (humbly) the Universal Recommender. 

We just completed a tool that measures the effects of what we call secondary events or indicators
using the Universal Recommender. It calculates a ranking based precision metric called mean
average precision—MAP@k. We took a dataset from the Rotten Tomatoes web site of “fresh”,
and “rotten” reviews and combined that with data about the genres, casts, directors, and
writers of the various video items. This gave us the indicators below:
like, video-id <== primary indicator
dislike, video-id
like-genre, genre-id
dislike-genre, genre-id
like-director, director-id
dislike-director, director-id
like-writer, writer-id
dislike-writer, writer-id
like-cast, cast-member-id
dislike-cast, cast-member-id
These aren’t necessarily what we would have chosen if we were designing something from scratch
but are possible to gather from public data.

We have only ~5000 mostly professional reviewers with ~250k video items in this dataset but
have a larger one we are integrating. We are also writing a white paper and blog post with
some deeper analysis. There are several tidbits of insight when you look deeper.

The bottom line is that using most of the above indicators we were able to get a 26% increase
in MAP@1 over using only “like”. This is important because the vast majority of recommenders
can only really ingest one type of indicator. <> <>
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message