mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Musselman <andrew.mussel...@gmail.com>
Subject Re: recommenditembased returns 0 records from last map-reduce job
Date Sun, 20 Jul 2014 19:12:42 GMT
I'm confused about how you're constructing the user file, and why there are negated item ids
here.

Can you post some more details please, including Mahout version and some sample data sets?

> On Jul 20, 2014, at 11:57 AM, Serega Sheypak <serega.sheypak@gmail.com> wrote:
> 
> Hi, I'm trying to create item similarity.
> I gather items which users visit during shopping and then create a file:
> user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], depends on
> user action type and data source)
> UNION
> -item_id, item_id, 1.0 (from items dictionary)
> 
> and I do provide a userFile, where user_id = -item_id
> 
> The idea is to get item similary. If any user visits item named "A", i want
> to show him items "B", "c", "xxx" using preferences of other users.
> 
> The problem is that the last (???) mapreduce job returns 0 rows:
> 
> Here are my settings:
> 
> 
> sudo -u oozie mahout recommenditembased \
>                    --input visited_items_with_inverted_items \
> 
>                    --output result \
>                    --similarityClassname SIMILARITY_LOGLIKELIHOOD \
>                    --usersFile inverted_items \
>                    --numRecommendations 500 \
>                    --booleanData false \
>                    --maxPrefsPerUser 100 \
>                    --maxSimilaritiesPerItem 500 \
>                    --minPrefsPerUser 0\
>                    --maxPrefsPerUserInItemSimilarity 30 \
>                    --threshold 0.91 \
>                    --tempDir  temp \
> 
> Some counters... I don't get what do they mean....
> 
> 14/07/20 22:43:08 INFO mapred.JobClient:
>  org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
> 
> 14/07/20 22:43:08 INFO mapred.JobClient:     USERS=7528530
> 
> 14/07/20 22:43:43 INFO mapred.JobClient:
>  org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
> 
> 14/07/20 22:43:43 INFO mapred.JobClient:
>    USER_RATINGS_NEGLECTED=1,798,738
> 
> 14/07/20 22:43:43 INFO mapred.JobClient:     USER_RATINGS_USED=12,429,693
> 
> 
> 14/07/20 22:44:24 INFO mapred.JobClient:
>  org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> 
> 14/07/20 22:44:24 INFO mapred.JobClient:     ROWS=3312879
> 
> 14/07/20 22:45:18 INFO mapred.JobClient:
>  org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> 
> 14/07/20 22:45:18 INFO mapred.JobClient:     COOCCURRENCES=35882374
> 
> 14/07/20 22:45:18 INFO mapred.JobClient:     PRUNED_COOCCURRENCES=0
> 
> 14/07/20 22:46:00 INFO mapred.JobClient:     Map input records=3312879
> 
> 14/07/20 22:46:00 INFO mapred.JobClient:     Map output records=17570268
> 
> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce input records=5221907
> 
> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce output records=3312879
> 
> 
> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input records=3312879
> 
> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output records=3312879
> 
> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input records=3312879
> 
> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output records=3312879
> 
> 14/07/20 22:47:06 INFO mapred.JobClient:     Map input records=7528530
> 
> 14/07/20 22:47:06 INFO mapred.JobClient:     Map output records=3313251
> 
> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce input records=3313251
> 
> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce output records=3313251
> 
> 14/07/20 22:47:40 INFO mapred.JobClient:     Map input records=6626130
> 
> 14/07/20 22:47:40 INFO mapred.JobClient:     Map output records=6626130
> 
> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce input records=6626130
> 
> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce output records=3312879
> 
> 
> 14/07/20 22:48:26 INFO mapred.JobClient:     Map input records=3312879
> 
> 14/07/20 22:48:26 INFO mapred.JobClient:     Map output records=3313251
> 
> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce input records=3313251
> 
> --------
> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce output records=0
> --------
> 
> why 0???

Mime
View raw message