mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: recommenditembased returns 0 records from last map-reduce job
Date Sun, 20 Jul 2014 19:19:34 GMT
the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
users_file:
--inverted_item_id
-1
-2
-3
-4

users_items_prefs
--inverted item_id
-1 1 1.0
-2 2 1.0
-3 3 1.0
-4 4 1.0
--user_id item_id pref_value
11   1 1.6
11   2 1.6
123 3 2.0
123 4 2.0
333 1 2.0
333 2 1.6
--e.t.c.

if I set --booleanData true
then mahout returns the result.




2014-07-20 23:12 GMT+04:00 Andrew Musselman <andrew.musselman@gmail.com>:

> I'm confused about how you're constructing the user file, and why there
> are negated item ids here.
>
> Can you post some more details please, including Mahout version and some
> sample data sets?
>
> > On Jul 20, 2014, at 11:57 AM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
> >
> > Hi, I'm trying to create item similarity.
> > I gather items which users visit during shopping and then create a file:
> > user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], depends
> on
> > user action type and data source)
> > UNION
> > -item_id, item_id, 1.0 (from items dictionary)
> >
> > and I do provide a userFile, where user_id = -item_id
> >
> > The idea is to get item similary. If any user visits item named "A", i
> want
> > to show him items "B", "c", "xxx" using preferences of other users.
> >
> > The problem is that the last (???) mapreduce job returns 0 rows:
> >
> > Here are my settings:
> >
> >
> > sudo -u oozie mahout recommenditembased \
> >                    --input visited_items_with_inverted_items \
> >
> >                    --output result \
> >                    --similarityClassname SIMILARITY_LOGLIKELIHOOD \
> >                    --usersFile inverted_items \
> >                    --numRecommendations 500 \
> >                    --booleanData false \
> >                    --maxPrefsPerUser 100 \
> >                    --maxSimilaritiesPerItem 500 \
> >                    --minPrefsPerUser 0\
> >                    --maxPrefsPerUserInItemSimilarity 30 \
> >                    --threshold 0.91 \
> >                    --tempDir  temp \
> >
> > Some counters... I don't get what do they mean....
> >
> > 14/07/20 22:43:08 INFO mapred.JobClient:
> >  org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
> >
> > 14/07/20 22:43:08 INFO mapred.JobClient:     USERS=7528530
> >
> > 14/07/20 22:43:43 INFO mapred.JobClient:
> >
>  org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
> >
> > 14/07/20 22:43:43 INFO mapred.JobClient:
> >    USER_RATINGS_NEGLECTED=1,798,738
> >
> > 14/07/20 22:43:43 INFO mapred.JobClient:     USER_RATINGS_USED=12,429,693
> >
> >
> > 14/07/20 22:44:24 INFO mapred.JobClient:
> >
>  org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> >
> > 14/07/20 22:44:24 INFO mapred.JobClient:     ROWS=3312879
> >
> > 14/07/20 22:45:18 INFO mapred.JobClient:
> >
>  org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> >
> > 14/07/20 22:45:18 INFO mapred.JobClient:     COOCCURRENCES=35882374
> >
> > 14/07/20 22:45:18 INFO mapred.JobClient:     PRUNED_COOCCURRENCES=0
> >
> > 14/07/20 22:46:00 INFO mapred.JobClient:     Map input records=3312879
> >
> > 14/07/20 22:46:00 INFO mapred.JobClient:     Map output records=17570268
> >
> > 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce input records=5221907
> >
> > 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce output
> records=3312879
> >
> >
> > 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input records=3312879
> >
> > 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
> records=3312879
> >
> > 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input records=3312879
> >
> > 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
> records=3312879
> >
> > 14/07/20 22:47:06 INFO mapred.JobClient:     Map input records=7528530
> >
> > 14/07/20 22:47:06 INFO mapred.JobClient:     Map output records=3313251
> >
> > 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce input records=3313251
> >
> > 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce output
> records=3313251
> >
> > 14/07/20 22:47:40 INFO mapred.JobClient:     Map input records=6626130
> >
> > 14/07/20 22:47:40 INFO mapred.JobClient:     Map output records=6626130
> >
> > 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce input records=6626130
> >
> > 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce output
> records=3312879
> >
> >
> > 14/07/20 22:48:26 INFO mapred.JobClient:     Map input records=3312879
> >
> > 14/07/20 22:48:26 INFO mapred.JobClient:     Map output records=3313251
> >
> > 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce input records=3313251
> >
> > --------
> > 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce output records=0
> > --------
> >
> > why 0???
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message