mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sebastian Schelter <...@apache.org>
Subject Re: The performance of mahout's recommender.
Date Sun, 30 Sep 2012 12:40:16 GMT
Hello Hu,

the performance here depends very much on the distribution of ratings
towards items, furthermore you have an extremely high number of items
which makes it hard to use an item-based approach.

With a blocksize of 128MB, 7.8GB correspond to 63 blocks, so are you
sure you really leverage all machines? How long did each of the
MapReduce steps of the job take, when did you kill it?

The parameter maxPrefsPerUserInItemSimilarity (with a default value of
1000) determines how many observations to take into account per user,
setting this to a smaller value drastically increases performance. This
should be the first thing to play with.

Best,
Sebastian



On 30.09.2012 14:07, 胡仲义 wrote:
> *Hi, I am a mahout user and I am confused by the performance of mahout's
> recommender.*
> *
> *
> *I have a prefrence data set of an e-commerce platform, and each line of
> the data file represents a single prefrence in the form of
> userID,itemID,rating value. The input is 7.8GB as a text file, and contains
> 3,70,250,381 lines of user-item-prefrence associations, from 1,32,598,906
> user to 35,920,654 distinct items. I use mahout to recommend 10 items for
> each user with **org.apache.mahout.cf.taste.hadoop.item.RecommenderJob on
> hadoop clusters with 250 Linux servers. The command is as follow:*
> *
> *
> *$./mahout org.apache.mahout.cf.taste.hadoop.item.RecommenderJob -i
> input/input.txt -o  output -s SIMILARITY_LOGLIKELIHOOD --usersFile
> input/users.txt --numRecommendations 10   --tempDir temp
> 
> 
>                    *
> *
> *
> *However, the performance let me down, it took 23 hours to get the result.
>  I want to know is it normal or there are some methods can improve the
> performance.*
> *
> *
> *thanks.*
> *
> *
> *--Hu Zhy*
> *
> *
> 


Mime
View raw message