mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: recommenditembased returns 0 records from last map-reduce job
Date Mon, 21 Jul 2014 07:09:42 GMT
I've inspected the code, our approach wouldn't work with booleanData=false.
We do calcualte imte similarity in the wrong way...(((
Thank you
1. We provide "fake" user_id and provide --usersFile in order to get
recommendations for "fake user_id, where user_id is a negative item_id. It
worked when we did provide user_id->item_id pairs without preference.
2. Our target is to get item similarities. We tried
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob but it
returns bad result comparing to RecommenderJob with our "fake" user_id
(inverted item_id)

1. I'll try the option you provided.
2. I will remove input with fake user_id and usersFile with these fake ids

3.
https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
I don't understand how to pass ---outputPathForSimilarityMatrix option to
RecommenderJob


2014-07-21 4:58 GMT+04:00 Peng Zhang <pzhang.xjtu@gmail.com>:

> Seraga,
>
> I have two comments:
> 1. Don’t use negative user ids. Since Mahout uses user id as well as item
> id as the row/column index, you’d better use 0, 1, 2, etc as ids
> 2. If you want to get the item similarity information, you can use
> --outputPathForSimilarityMatrix in the command
>
> Regards,
> Peng Zhang
> M: +86 186-1658-7856
> pzhang.xjtu@gmail.com
>
>
>
>
>
> On Jul 21, 2014, at 4:00 AM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
>
> > All bad things happen here:
> >
> >
> >
> > Name
> >
> > RecommenderJob-PartialMultiplyMapper-Reducer
> >
> > User
> >
> > oozie
> >
> > Process User
> >
> > oozie
> >
> > Group
> >
> > oozie
> >
> > Mapper Class
> >
> > PartialMultiplyMapper
> >
> > Reducer Class
> >
> > AggregateAndRecommendReducer
> >
> >
> > Job Input Directory
> >
> > hdfs://nameservice1/itemrec/temp/partialMultiply
> >
> > Job Output Directory
> >
> > hdfs://nameservice1/itemrec/output/
> >
> > 14/07/20 23:57:47 INFO mapred.JobClient:     Map input records=3312879
> >
> > 14/07/20 23:57:47 INFO mapred.JobClient:     Map output records=3313251
> >
> >
> > 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce input records=3313251
> >
> > 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce output records=0
> >
> > Why does mahout returns 0 rows? it works when booleanData=true
> (preferences
> > are ignored...?)
> >
> >
> >
> > 2014-07-20 23:19 GMT+04:00 Serega Sheypak <serega.sheypak@gmail.com>:
> >
> >> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
> >> users_file:
> >> --inverted_item_id
> >> -1
> >> -2
> >> -3
> >> -4
> >>
> >> users_items_prefs
> >> --inverted item_id
> >> -1 1 1.0
> >> -2 2 1.0
> >> -3 3 1.0
> >> -4 4 1.0
> >> --user_id item_id pref_value
> >> 11   1 1.6
> >> 11   2 1.6
> >> 123 3 2.0
> >> 123 4 2.0
> >> 333 1 2.0
> >> 333 2 1.6
> >> --e.t.c.
> >>
> >> if I set --booleanData true
> >> then mahout returns the result.
> >>
> >>
> >>
> >>
> >> 2014-07-20 23:12 GMT+04:00 Andrew Musselman <andrew.musselman@gmail.com
> >:
> >>
> >> I'm confused about how you're constructing the user file, and why there
> >>> are negated item ids here.
> >>>
> >>> Can you post some more details please, including Mahout version and
> some
> >>> sample data sets?
> >>>
> >>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak <
> serega.sheypak@gmail.com>
> >>> wrote:
> >>>>
> >>>> Hi, I'm trying to create item similarity.
> >>>> I gather items which users visit during shopping and then create a
> file:
> >>>> user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9],
> depends
> >>> on
> >>>> user action type and data source)
> >>>> UNION
> >>>> -item_id, item_id, 1.0 (from items dictionary)
> >>>>
> >>>> and I do provide a userFile, where user_id = -item_id
> >>>>
> >>>> The idea is to get item similary. If any user visits item named "A",
i
> >>> want
> >>>> to show him items "B", "c", "xxx" using preferences of other users.
> >>>>
> >>>> The problem is that the last (???) mapreduce job returns 0 rows:
> >>>>
> >>>> Here are my settings:
> >>>>
> >>>>
> >>>> sudo -u oozie mahout recommenditembased \
> >>>>                   --input visited_items_with_inverted_items \
> >>>>
> >>>>                   --output result \
> >>>>                   --similarityClassname SIMILARITY_LOGLIKELIHOOD \
> >>>>                   --usersFile inverted_items \
> >>>>                   --numRecommendations 500 \
> >>>>                   --booleanData false \
> >>>>                   --maxPrefsPerUser 100 \
> >>>>                   --maxSimilaritiesPerItem 500 \
> >>>>                   --minPrefsPerUser 0\
> >>>>                   --maxPrefsPerUserInItemSimilarity 30 \
> >>>>                   --threshold 0.91 \
> >>>>                   --tempDir  temp \
> >>>>
> >>>> Some counters... I don't get what do they mean....
> >>>>
> >>>> 14/07/20 22:43:08 INFO mapred.JobClient:
> >>>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
> >>>>
> >>>> 14/07/20 22:43:08 INFO mapred.JobClient:     USERS=7528530
> >>>>
> >>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>
> >>>
> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
> >>>>
> >>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>   USER_RATINGS_NEGLECTED=1,798,738
> >>>>
> >>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>> USER_RATINGS_USED=12,429,693
> >>>>
> >>>>
> >>>> 14/07/20 22:44:24 INFO mapred.JobClient:
> >>>>
> >>>
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> >>>>
> >>>> 14/07/20 22:44:24 INFO mapred.JobClient:     ROWS=3312879
> >>>>
> >>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> >>>>
> >>>
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> >>>>
> >>>> 14/07/20 22:45:18 INFO mapred.JobClient:     COOCCURRENCES=35882374
> >>>>
> >>>> 14/07/20 22:45:18 INFO mapred.JobClient:     PRUNED_COOCCURRENCES=0
> >>>>
> >>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map input records=3312879
> >>>>
> >>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map output
> records=17570268
> >>>>
> >>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce input
> >>> records=5221907
> >>>>
> >>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce output
> >>> records=3312879
> >>>>
> >>>>
> >>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
> >>> records=3312879
> >>>>
> >>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
> >>> records=3312879
> >>>>
> >>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
> >>> records=3312879
> >>>>
> >>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
> >>> records=3312879
> >>>>
> >>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map input records=7528530
> >>>>
> >>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map output
> records=3313251
> >>>>
> >>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce input
> >>> records=3313251
> >>>>
> >>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce output
> >>> records=3313251
> >>>>
> >>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map input records=6626130
> >>>>
> >>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map output
> records=6626130
> >>>>
> >>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce input
> >>> records=6626130
> >>>>
> >>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce output
> >>> records=3312879
> >>>>
> >>>>
> >>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map input records=3312879
> >>>>
> >>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map output
> records=3313251
> >>>>
> >>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce input
> >>> records=3313251
> >>>>
> >>>> --------
> >>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce output records=0
> >>>> --------
> >>>>
> >>>> why 0???
> >>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message