mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Warunika Ranaweera <warunik...@gmail.com>
Subject Performance issues in Mahout recommendations
Date Fri, 06 Jun 2014 09:10:00 GMT
Hi,

I am using Mahout's recommenditembased algorithm on a data set with nearly
10,000 (implicit) user ratings. This is the command I used:
*mahout recommenditembased --input ratings.csv --output recommendation
--usersFile users.dat --tempDir temp --similarityClassname
SIMILARITY_LOGLIKELIHOOD --numRecommendations 3 *

Although the output is successfully generated, this process takes nearly 7
minutes to produce recommendations for a single user. The Hadoop cluster
has 8 nodes and the machine on which Mahout is invoked is an AWS EC2
c3.2xlarge server. When I tracked the mapreduce jobs, I noticed that more
than one machine is *not* utilized at a time, and the *recommenditembased*
command takes 9 mapreduce jobs altogether with approx. 45 seconds taken per
job.

Since the performance is too slow for real time recommendations, it would
be really helpful to know whether I'm missing out any additional commands
or configurations that enables faster performance.

Thanks,
Warunikay

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message