hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Sangroya <sangroyaa...@gmail.com>
Subject RecommenderJob Mahout Creating a data model
Date Wed, 14 Sep 2011 11:28:58 GMT
Hi all,

I am trying to run the example from

with the following command bin/mahout
-Dmapred.input.dir=input -Dmapred.output.dir=output --itemsFile itemfile
--tempDir tempDir

The algorithm estimate the preference of a user towards an item which he/she
has not yet seen. Once an algorithm can predict preferences it can also be
used to do Top-N-Recommendation where the task is to find the N items a
given user might like best. It is mentioned that given a DataModel, it can
produce recommendations.

The algorithm takes approx. 5 minutes to generate top 5 recommendations for
one user on a 10 node hadoop cluster. The size of input is shortened only to
200 users from "1 Million MovieLens Dataset" from Grouplens.org.

I have few questions:

1) I want to know that if it is possible to isolate the data model building
step to generating recommendations.

2) Can we use the model once generated using the training data for
generating recommendations for a range of users.

3) To be specific, if I want to provide an on-line service that generates
recommendations for users, Can I minimize the cost of MapReduce interactions
each time.

I am not a data mining expert. Please help me to understand this in a better

Thanks and Regards,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message