mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Najum Ali <naju...@googlemail.com>
Subject Re: Performance Issue using item-based approach!
Date Thu, 17 Apr 2014 10:18:03 GMT
Ok, here you go:

I have created a simple class with main-method (no server and other stuff):

public class RecommenderTest {
	public static void main(String[] args) throws IOException, TasteException {
		DataModel dataModel = new FileDataModel(new File("/Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv"));
		ItemSimilarity similarity = new LogLikelihoodSimilarity(dataModel);
		ItemBasedRecommender recommender = new GenericItemBasedRecommender(dataModel, similarity);

		String pathToPreComputedFile = preComputeSimilarities(recommender, dataModel.getNumItems());

		InputStream inputStream = new FileInputStream(new File(pathToPreComputedFile));
		BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream));
		Collection<GenericItemSimilarity.ItemItemSimilarity> correlations = bufferedReader.lines().map(mapToItemItemSimilarity).collect(Collectors.toList());
		ItemSimilarity precomputedSimilarity = new GenericItemSimilarity(correlations);
		ItemBasedRecommender recommenderWithPrecomputation = new GenericItemBasedRecommender(dataModel,
precomputedSimilarity);

		recommend(recommender);
		recommend(recommenderWithPrecomputation);
	}

	private static String preComputeSimilarities(ItemBasedRecommender recommender, int simItemsPerItem)
throws TasteException {
		String pathToAbsolutePath = "";
		try {
			File resultFile = new File(System.getProperty("java.io.tmpdir"), "similarities.csv");
			if (resultFile.exists()) {
				resultFile.delete();
			}
			BatchItemSimilarities batchJob = new MultithreadedBatchItemSimilarities(recommender, simItemsPerItem);
			int numSimilarities = batchJob.computeItemSimilarities(Runtime.getRuntime().availableProcessors(),
1,
					new FileSimilarItemsWriter(resultFile));
			pathToAbsolutePath = resultFile.getAbsolutePath();
			System.out.println("Computed " + numSimilarities + " similarities and saved them to " +
pathToAbsolutePath);
		} catch (IOException e) {
			System.out.println("Error while writing pre computed similarities to file");
		}
		return pathToAbsolutePath;
	}

	private static void recommend(ItemBasedRecommender recommender) throws TasteException {
		long start = System.nanoTime();
		List<RecommendedItem> recommendations = recommender.recommend(1, 10);
		long end = System.nanoTime();
		System.out.println("Created recommendations in " + getCalculationTimeInMilliseconds(start,
end) + " ms. Recommendations:" + recommendations);
	}

	private static double getCalculationTimeInMilliseconds(long start, long end) {
		double calculationTime = (end - start);
		return (calculationTime / 1_000_000);
	}


	private static Function<String, GenericItemSimilarity.ItemItemSimilarity> mapToItemItemSimilarity
= (line) -> {
		String[] row = line.split(",");
		return new GenericItemSimilarity.ItemItemSimilarity(
				Long.parseLong(row[0]), Long.parseLong(row[1]), Double.parseDouble(row[2]));
	};
}

And thats the Output-log:

3 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Creating FileDataModel
for file /Users/najum/Documents/recommender-console/src/main/webapp/resources/preference_csv/1mil.csv
63 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Reading file info...
1207 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Processed 1000000
lines
1208 [main] INFO org.apache.mahout.cf.taste.impl.model.file.FileDataModel - Read lines: 1000209
1475 [main] INFO org.apache.mahout.cf.taste.impl.model.GenericDataModel - Processed 6040 users
1599 [main] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- Queued 3706 items in 38 batches
10928 [pool-1-thread-8] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 7 processed 5 batches
10928 [pool-1-thread-8] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 7 processed 5 batches. done.
10978 [pool-1-thread-5] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 4 processed 4 batches. done.
11589 [pool-1-thread-4] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 3 processed 5 batches
11589 [pool-1-thread-4] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 3 processed 5 batches. done.
11592 [pool-1-thread-6] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 5 processed 5 batches
11592 [pool-1-thread-6] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 5 processed 5 batches. done.
11707 [pool-1-thread-7] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 6 processed 5 batches
11707 [pool-1-thread-7] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 6 processed 5 batches. done.
11730 [pool-1-thread-3] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 2 processed 4 batches. done.
11849 [pool-1-thread-1] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 0 processed 5 batches
11849 [pool-1-thread-1] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 0 processed 5 batches. done.
11854 [pool-1-thread-2] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 1 processed 5 batches
11854 [pool-1-thread-2] INFO org.apache.mahout.cf.taste.impl.similarity.precompute.MultithreadedBatchItemSimilarities
- worker 1 processed 5 batches. done.
Computed 9174333 similarities and saved them to /var/folders/9g/4h38v1tj3ps9j21skc72b56r0000gn/T/similarities.csv
Created recommendations in 1683.613 ms. Recommendations:[RecommendedItem[item:3890, value:4.6771617],
RecommendedItem[item:3530, value:4.662509], RecommendedItem[item:127, value:4.660716], RecommendedItem[item:3323,
value:4.660716], RecommendedItem[item:3382, value:4.660716], RecommendedItem[item:3123, value:4.603366],
RecommendedItem[item:3233, value:4.5707765], RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989,
value:4.5263577], RecommendedItem[item:2343, value:4.524066]]
Created recommendations in 985.679 ms. Recommendations:[RecommendedItem[item:3530, value:5.0],
RecommendedItem[item:3382, value:5.0], RecommendedItem[item:3890, value:4.6771617], RecommendedItem[item:127,
value:4.660716], RecommendedItem[item:3323, value:4.660716], RecommendedItem[item:3123, value:4.603366],
RecommendedItem[item:3233, value:4.5707765], RecommendedItem[item:1434, value:4.553473], RecommendedItem[item:989,
value:4.5263577], RecommendedItem[item:2343, value:4.524066]]

Again almost same results. Although what I also don´t understand is, why am I getting different
RecommendItems? 
That really frustrates me… 

You can find the Java file in the attachment. 

Greetings from Germany,
Najum

Am 17.04.2014 um 11:44 schrieb Sebastian Schelter <ssc@apache.org>:

> Yes, just to make sure the problem is in the mahout code and not in the surrounding environment.
> 
> On 04/17/2014 11:43 AM, Najum Ali wrote:
>> @Sebastian
>> What do u mean with a standalone recommender? A simple offline java main program?
>> 
>> Am 17.04.2014 um 11:41 schrieb Sebastian Schelter <ssc@apache.org>:
>> 
>>> Could you take the output of the precomputation, feed it into a standalone recommender
and test it there?
>>> 
>>> 
>>> On 04/17/2014 11:37 AM, Najum Ali wrote:
>>>> @sebastian
>>>> 
>>>>> Are you sure that the precomputation is done only once and not in every
request?
>>>> Yes, a @Bean annotated Object is in Spring per default a singleton instance.
>>>> I also just tested it out using a System.out.println()
>>>> Here is my log:
>>>> 
>>>> System.out.println("----> precomputation done!“ is called before returning
the
>>>> GenericItemSimilarity.
>>>> 
>>>> The first two recommendations are Item-based -> pearson similarity
>>>> The thrid and 4th log are also item-based using pre computed similarity
>>>> The last log is the userbased recommender using pearson
>>>> 
>>>> Look at the huge time difference!
>>>> 
>>>> Am 17.04.2014 um 11:23 schrieb Sebastian Schelter <ssc@apache.org
>>>> <mailto:ssc@apache.org>>:
>>>> 
>>>>> Najum,
>>>>> 
>>>>> this is really strange, feeding an ItemBased Recommender with precomputed
>>>>> similarities should give you superfast recommendations.
>>>>> 
>>>>> Are you sure that the precomputation is done only once and not in every
request?
>>>>> 
>>>>> --sebastian
>>>>> 
>>>>> On 04/17/2014 11:17 AM, Najum Ali wrote:
>>>>>> Hi guys,
>>>>>> 
>>>>>> I have created a precomputed item-item-similarity collection for
a
>>>>>> GenericItemBasedRecommender.
>>>>>> Using the 1M MovieLens data, my item-based recommender is only 40-50%
faster
>>>>>> than without precomputation (like 589.5ms instead 1222.9ms).
>>>>>> But the user-based recommender instead is really fast, it´s like
24.2ms? How can
>>>>>> this happen?
>>>>>> 
>>>>>> Here are more details to my Implementation:
>>>>>> 
>>>>>> CSV File: 1M pref, 6040 Users, 3706 Items
>>>>>> 
>>>>>> For my Implementation I´m using screenshots, because having the
good
>>>>>> highlighting.
>>>>>> My Recommender runs inside a Webserver (Jetty) using Spring 4 and
Java8. I
>>>>>> receive Recommendations as Webservice (JSON).
>>>>>> 
>>>>>> For DataModel, I´m using FileDataModel.
>>>>>> 
>>>>>> 
>>>>>> This code below creates me a precomputed ItemSimilarity when I start
the
>>>>>> Webserver and the property isItemPreComputationEnabled is set to
true:
>>>>>> 
>>>>>> 
>>>>>> For time measuring I´m using AOP. I´m measuring the whole time
from entering my
>>>>>> Controller to sending the response.
>>>>>> based on System.nanoTime(); and getting the diff. It´s the same
time measure for
>>>>>> user based.
>>>>>> 
>>>>>> I haved tried to cache the recommender and the similarity with no
big
>>>>>> difference. I also tried to use CandidateItemsStrategy and
>>>>>> MostSimilarItemsCandidateItemsStrategy, but also no performance boost.
>>>>>> 
>>>>>> public RecommenderBuilder createRecommenderBuilder(ItemSimilarity
similarity)
>>>>>> throws TasteException {
>>>>>> final int numberOfUsers = dataModel.getNumUsers();
>>>>>> final int numberOfItems = dataModel.getNumItems();
>>>>>> CandidateItemsStrategy candidateItemsStrategy = new
>>>>>> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
>>>>>> MostSimilarItemsCandidateItemsStrategy mostSimilarStrategy = new
>>>>>> SamplingCandidateItemsStrategy(numberOfUsers,numberOfItems);
>>>>>> return model -> new GenericItemBasedRecommender(model,
>>>>>> similarity,candidateItemsStrategy,mostSimilarStrategy);
>>>>>> }
>>>>>> 
>>>>>> I dont know why item-based is taking so much longer then user-based.
User-based
>>>>>> is like fast as hell. I even tried a DataSet using 100k Prefs, and
10Million
>>>>>> (Movielens). Everytime the user-based is soo much faster for any
similarity.
>>>>>> 
>>>>>> Hope you anyone can help me to understand this. Maybe I´m doing
something wrong.
>>>>>> 
>>>>>> Thanks!! :))
>> 
>> 
> 


Mime
View raw message