mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Solr+Mahout Recommender Demo Site
Date Sun, 06 Apr 2014 17:26:28 GMT
After having integrated several versions of the Mahout and Myrrix recommenders at fairly large
scale. I was interested in solving three problems that these did not directly provide for:
1) realtime queries for recs using data not yet incorporated into the training set. Myrrix
allows this but Mahout using the hadoop mr version does not.
2) cross-recommendations from two or more action types (say purchase and detail-view)
3) blending metadata and user preference data to return recs (for example category & user
preferences => recs)

Using Solr + Mahout provided an amazingly flexible and performant way to do this. Ted wrote
about his experience with this basic approach in his recent book. Take user preferences, run
them through RowSimilarityJob and you get an item by item similarity Matrix. This is the core
of an item-based cooccurrence recommender. If you take the similarity matrix, and convert
it into a list of tokens per row, you have something Solr can index. If you then use a user’s
history as a query on the indexed data you get an ordered list of recommendations.

When I set out to do #1 and #3 the need for CF data AND metadata was the first problem. So
I mined the web for video reviews and video metadata. Then logging any users who visit the
site will lead to data for #2 and #1.

The demo site is and instructions are at the end of this for
anyone who would like to test it out. As a crude user test there is a procedure we ask you
to follow to help gather quality of recommendations data. It’s running out of my closet
over Comcast so if it’s down I may have tripped over a cord, sorry try again later.

There are a bunch of different methods for making recs illustrated on the site. One method
that illustrates blending metadata uses preference data from you, and metadata to bias and
filter recs. Imagine that you have trained the system with your preferences by making some
video picks. Now imagine you’d like to get recommendations for Comedies from Neflix based
on your previous video preferences. This is done with a single Solr query on indexed video
fields that hold genre, similar videos (from the similarity matrix), and sources. The query
finds similar videos to the ones you have liked, with the genre “Comedy” boosted by some
amount, but only those that have at least one source = “Netflix”. 

I’ll be doing some blog posts covering the specifics of how each rec type is done, the site
and DB architecture, and Solr setup.

The project uses the Solr recommender prep code here:

BTW I plan to publish obfuscated usage data in the github repo.

begin form letter =======================================

Please use a very newly updated browser (latest Firefox, Chrome, Safari, and nothing older
than IE10) the site doesn’t yet check browser compatibility but relies on HTML5 and CSS3
rather heavily.

1) go to to create an account
2) go to to ’train' the recommender hit thumbs up
on videos you like. There are 20 pages of training videos, you can leave at any time but if
you can go through them all it would be appreciated.
3) go to to immediately get personalized recs
from your training data. If you completed the trainer check the top line of recs, count how
many are videos you liked or would like to see. Scroll right or left to see a total of 24
in four batches of 6. If you could report to me the total you thought were good recs it would
be greatly appreciated. 
4) browse videos by various criteria here: These are not
recommendations, they are simply a catalog.
5) control how you browse videos by clicking the gears icon. You can set all videos to be
from one or more sources here. If you choose Netflix alone (don’t forget to uncheck ‘all’)
then recs and browsed videos will all be available on Netflix.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message