mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <>
Subject Re: recommendation based on user preference
Date Thu, 10 Jul 2014 20:40:07 GMT
Doing things this way you are using the neighborhoodID as a proxy for userID/rowID in the recommender.
I don’t see the benefit of the in-memory version here since all output can easily be pre-calculated.
Then it will only be a lookup at runtime. You can use “rowsimilarity" on a single machine
without setting up a cluster, just use the local filesystem. This is the way I’d do it.

You definitely don’t want to “only load the columns (amenities) correlate to a selected
user” with an in-memory recommender. This loading of data will trigger a retraining of the
recommender before you can ask it for similar neighborhoods and that will take more time than
you want. This potentially will happen as each new user visits your app.

However if you train on all amenities for all neighborhoods then the in-memory recommender
should work and would train only once. Your data would look like: (heighborhoodID, cafesID,
numberOfCafes) and so on for every non-zero cell in the table. And remember that ALL IDs must
be Mahout IDs—you can’t use your own IDs. Mahout IDs correspond to matrix coordinants,
they are ordinal Ints. Think of them as the row and column number of the table.
On Jul 10, 2014, at 10:45 AM, Edith Au <> wrote:

Thank you so much for the suggestions.  It took me sometime to figure
things out but I believe I have a pretty good grip on what's need to be
done now. My dataset is small enough to fit into a single machine so I am
going to use an in memory implementation rather than hadoop.   As suggested
by both Pat and Manuel, I have a table (in file system) with neighborhoods
as rows and amenities as columns.  In runtime, I will only load the columns
(amenities) correlate to a selected user and do a UserSimilarity operation
between each neighborhood and the one the user resides in.  After that, I
can pick up the NearestNUserNeighborhoods for results.

I gather UserSimilarity is the in-memory equivalent of RowSimilarity
(Hadoop) ?  It would be great if someone can confirm it!

Thanks again Pat and Manuel!

On Wed, Jul 2, 2014 at 4:06 PM, Pat Ferrel <> wrote:

> If you are looking to recommend a similar neighborhood based on the
> characteristics of some other neighborhood (the user’s current one) so you
> wouldn’t use collaborative filtering. This is a metadata recommender based
> on similarity of neighborhoods not a collection of user preferences.
> The easiest and fastest would be to use a search engine but I’ll leave
> that for now since it doesn’t account for feature weights as well.
> create a table like this:
> Neighborhood    Gym Cafe        Bookstore
> Downtown        15      50              0
> Midtown         30      100             10
> …
> You will need to convert the row IDs into sequential ints, which Mahout
> uses for IDs. Then read them into a sequenceFile creating a Distributed Row
> Matrix, which has Key -  Value pairs. Keys = the integer neighborhood IDs,
> the Value is a Vector (a sort of list) of column integer IDs with the
> counts.
> Then run rowsimilarity on the DRM. This is the CLI but there is also a
> Driver you can call from your code.
> There are some data prep issues you will have since larger neighborhoods
> will have higher counts. An easy thing to do would be to normalize the
> counts by something like population or physical size so you get cafes per
> resident or per sq mile or some other ratio.
> The result of the rowsimilarity job will be another DRM of key =
> neightborhood ID, values = Vector of similar neighborhoods (by integer ID)
> with a strength of similarity. Sort the vector by strength and you’ll have
> an ordered list of similar neighborhoods for each neighborhood.
> On Jun 30, 2014, at 12:48 PM, Edith Au <> wrote:
> Hi,
> I am a newbie and am looking for some guidance to implement my
> recommender.  Any help would be greatly appreciated.  I have a small
> data set of location information with the following fields:
> neighborhood, amenities, and counts.  For example:
> Downtown          Gym 15
> Downtown          Cafe 50
> …
> Midtown             Gym 30
> Midtown             Cafe 100
> Midtown             Bookstore 10
> ...
> Financial Dist
> …
> so on and so forth.  I want to recommend a neighborhood for a user to
> reside base on the amenities (and some other metrics) in his/her
> current neighborhood.    My understanding is that model-based
> recommendation would be a good fit for the job.  If I am on the right
> track,  is there a experimental/beta recommender I can try?
> If there is no such recommender yet, can I still use Mahout for my
> project?  For example, can I implement my own Similarity which only
> computes the similarity between one user's preference to a set of
> neighborhood?  If I understand Mahout correctly, User/Item Similarity
> would do N x (N-1) pair of comparisons as oppose to 1 x N comparisons.
> In my example, User/Item Similarity would compare between Downtown,
> Midtown, Fin Dist -- which would be a waste in computation resources
> since the comparisons are not needed.
> Thanks in advance for your help.
> Edith

View raw message