mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Najum Ali <naju...@googlemail.com>
Subject Performance Issue using item-based approach!
Date Thu, 17 Apr 2014 09:17:16 GMT
Hi guys, 

I have created a precomputed item-item-similarity collection for a GenericItemBasedRecommender.
Using the 1M MovieLens data, my item-based recommender is only 40-50% faster than without
precomputation (like 589.5ms instead 1222.9ms). 
But the user-based recommender instead is really fast, it´s like 24.2ms? How can this happen?


Here are more details to my Implementation:

CSV File: 1M pref, 6040 Users, 3706 Items

For my Implementation I´m using screenshots, because having the good highlighting.
My Recommender runs inside a Webserver (Jetty) using Spring 4 and Java8. I receive Recommendations
as Webservice (JSON).

For DataModel, I´m using FileDataModel.



This code below creates me a precomputed ItemSimilarity when I start the Webserver and the
property isItemPreComputationEnabled is set to true:



For time measuring I´m using AOP. I´m measuring the whole time from entering my Controller
to sending the response.
based on System.nanoTime(); and getting the diff. It´s the same time measure for user based.

I haved tried to cache the recommender and the similarity with no big difference. I also tried
to use CandidateItemsStrategy and MostSimilarItemsCandidateItemsStrategy, but also no performance
boost.

	public RecommenderBuilder createRecommenderBuilder(ItemSimilarity similarity) throws TasteException
{
		final int numberOfUsers = dataModel.getNumUsers();
		final int numberOfItems = dataModel.getNumItems();
		CandidateItemsStrategy candidateItemsStrategy = new SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
		MostSimilarItemsCandidateItemsStrategy mostSimilarStrategy = new SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
		return model -> new GenericItemBasedRecommender(model, similarity,candidateItemsStrategy,mostSimilarStrategy);
	}

I dont know why item-based is taking so much longer then user-based. User-based is like fast
as hell. I even tried a DataSet using 100k Prefs, and 10Million (Movielens). Everytime the
user-based is soo much faster for any similarity. 

Hope you anyone can help me to understand this. Maybe I´m doing something wrong. 

Thanks!! :))




Mime
View raw message