mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: recommenditembased returns 0 records from last map-reduce job
Date Mon, 21 Jul 2014 18:05:56 GMT
Hi, I've tried: Unexpected --outputPathForSimilarityMatrix while processing
Job-Specific

sudo -u hdfs hadoop fs -rm -r hdfs://nameservice1/recommenditembased/output
sudo -u hdfs hadoop fs -rm -r hdfs://nameservice1/recommenditembased/temp
sudo -u oozie mahout recommenditembased \
                    --input \

hdfs://nameservice1/user/hive/warehouse/staging_weighted_visits_and_rec_clicks
\
                    --output \
                    hdfs://nameservice1/recommenditembased/output \
                    --similarityClassname \
                    SIMILARITY_LOGLIKELIHOOD \
                   --numRecommendations \
                    500 \
                    --booleanData \
                    false \
                    --maxPrefsPerUser \
                    1000 \
                    --maxSimilaritiesPerItem \
                    1000 \
                    --minPrefsPerUser \
                    5 \
                    --maxPrefsPerUserInItemSimilarity \
                    30 \
                    --threshold \
                   1.1 \
                    --tempDir \
                    hdfs://nameservice1/recommenditembased/temp \
                    --outputPathForSimilarityMatrix \
                    hdfs://nameservice1/recommenditembased/sim_matrix


I'm on Cloudera cdh 4.7, looks like this feature is not supported.


2014-07-21 11:18 GMT+04:00 Peng Zhang <pzhang.xjtu@gmail.com>:

> Serega,
>
> See the last line on how to pass outputPathForSimilarityMatrix options to
> the recommenditembased command:
>
> sudo -u oozie mahout recommenditembased \
>                    --input visited_items_with_inverted_items \
>
>                    --output result \
>                    --similarityClassname SIMILARITY_LOGLIKELIHOOD \
>                    --usersFile inverted_items \
>                    --numRecommendations 500 \
>                    --booleanData false \
>                    --maxPrefsPerUser 100 \
>                    --maxSimilaritiesPerItem 500 \
>                    --minPrefsPerUser 0\
>                    --maxPrefsPerUserInItemSimilarity 30 \
>                    --threshold 0.91 \
>                    --tempDir  temp \
>                    --outputPathForSimilarityMatrix similarityMatri \
>
>
> Peng Zhang
> pzhang.xjtu@gmail.com
>
>
>
>
>
> On Jul 21, 2014, at 3:09 PM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
>
> > I've inspected the code, our approach wouldn't work with
> booleanData=false.
> > We do calcualte imte similarity in the wrong way...(((
> > Thank you
> > 1. We provide "fake" user_id and provide --usersFile in order to get
> > recommendations for "fake user_id, where user_id is a negative item_id.
> It
> > worked when we did provide user_id->item_id pairs without preference.
> > 2. Our target is to get item similarities. We tried
> > org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob but
> it
> > returns bad result comparing to RecommenderJob with our "fake" user_id
> > (inverted item_id)
> >
> > 1. I'll try the option you provided.
> > 2. I will remove input with fake user_id and usersFile with these fake
> ids
> >
> > 3.
> >
> https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
> > I don't understand how to pass ---outputPathForSimilarityMatrix option to
> > RecommenderJob
> >
> >
> > 2014-07-21 4:58 GMT+04:00 Peng Zhang <pzhang.xjtu@gmail.com>:
> >
> >> Seraga,
> >>
> >> I have two comments:
> >> 1. Don’t use negative user ids. Since Mahout uses user id as well as
> item
> >> id as the row/column index, you’d better use 0, 1, 2, etc as ids
> >> 2. If you want to get the item similarity information, you can use
> >> --outputPathForSimilarityMatrix in the command
> >>
> >> Regards,
> >> Peng Zhang
> >> M: +86 186-1658-7856
> >> pzhang.xjtu@gmail.com
> >>
> >>
> >>
> >>
> >>
> >> On Jul 21, 2014, at 4:00 AM, Serega Sheypak <serega.sheypak@gmail.com>
> >> wrote:
> >>
> >>> All bad things happen here:
> >>>
> >>>
> >>>
> >>> Name
> >>>
> >>> RecommenderJob-PartialMultiplyMapper-Reducer
> >>>
> >>> User
> >>>
> >>> oozie
> >>>
> >>> Process User
> >>>
> >>> oozie
> >>>
> >>> Group
> >>>
> >>> oozie
> >>>
> >>> Mapper Class
> >>>
> >>> PartialMultiplyMapper
> >>>
> >>> Reducer Class
> >>>
> >>> AggregateAndRecommendReducer
> >>>
> >>>
> >>> Job Input Directory
> >>>
> >>> hdfs://nameservice1/itemrec/temp/partialMultiply
> >>>
> >>> Job Output Directory
> >>>
> >>> hdfs://nameservice1/itemrec/output/
> >>>
> >>> 14/07/20 23:57:47 INFO mapred.JobClient:     Map input records=3312879
> >>>
> >>> 14/07/20 23:57:47 INFO mapred.JobClient:     Map output records=3313251
> >>>
> >>>
> >>> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce input
> records=3313251
> >>>
> >>> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce output records=0
> >>>
> >>> Why does mahout returns 0 rows? it works when booleanData=true
> >> (preferences
> >>> are ignored...?)
> >>>
> >>>
> >>>
> >>> 2014-07-20 23:19 GMT+04:00 Serega Sheypak <serega.sheypak@gmail.com>:
> >>>
> >>>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
> >>>> users_file:
> >>>> --inverted_item_id
> >>>> -1
> >>>> -2
> >>>> -3
> >>>> -4
> >>>>
> >>>> users_items_prefs
> >>>> --inverted item_id
> >>>> -1 1 1.0
> >>>> -2 2 1.0
> >>>> -3 3 1.0
> >>>> -4 4 1.0
> >>>> --user_id item_id pref_value
> >>>> 11   1 1.6
> >>>> 11   2 1.6
> >>>> 123 3 2.0
> >>>> 123 4 2.0
> >>>> 333 1 2.0
> >>>> 333 2 1.6
> >>>> --e.t.c.
> >>>>
> >>>> if I set --booleanData true
> >>>> then mahout returns the result.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> 2014-07-20 23:12 GMT+04:00 Andrew Musselman <
> andrew.musselman@gmail.com
> >>> :
> >>>>
> >>>> I'm confused about how you're constructing the user file, and why
> there
> >>>>> are negated item ids here.
> >>>>>
> >>>>> Can you post some more details please, including Mahout version
and
> >> some
> >>>>> sample data sets?
> >>>>>
> >>>>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak <
> >> serega.sheypak@gmail.com>
> >>>>> wrote:
> >>>>>>
> >>>>>> Hi, I'm trying to create item similarity.
> >>>>>> I gather items which users visit during shopping and then create
a
> >> file:
> >>>>>> user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9],
> >> depends
> >>>>> on
> >>>>>> user action type and data source)
> >>>>>> UNION
> >>>>>> -item_id, item_id, 1.0 (from items dictionary)
> >>>>>>
> >>>>>> and I do provide a userFile, where user_id = -item_id
> >>>>>>
> >>>>>> The idea is to get item similary. If any user visits item named
> "A", i
> >>>>> want
> >>>>>> to show him items "B", "c", "xxx" using preferences of other
users.
> >>>>>>
> >>>>>> The problem is that the last (???) mapreduce job returns 0 rows:
> >>>>>>
> >>>>>> Here are my settings:
> >>>>>>
> >>>>>>
> >>>>>> sudo -u oozie mahout recommenditembased \
> >>>>>>                  --input visited_items_with_inverted_items \
> >>>>>>
> >>>>>>                  --output result \
> >>>>>>                  --similarityClassname SIMILARITY_LOGLIKELIHOOD
\
> >>>>>>                  --usersFile inverted_items \
> >>>>>>                  --numRecommendations 500 \
> >>>>>>                  --booleanData false \
> >>>>>>                  --maxPrefsPerUser 100 \
> >>>>>>                  --maxSimilaritiesPerItem 500 \
> >>>>>>                  --minPrefsPerUser 0\
> >>>>>>                  --maxPrefsPerUserInItemSimilarity 30 \
> >>>>>>                  --threshold 0.91 \
> >>>>>>                  --tempDir  temp \
> >>>>>>
> >>>>>> Some counters... I don't get what do they mean....
> >>>>>>
> >>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
> >>>>>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
> >>>>>>
> >>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:     USERS=7528530
> >>>>>>
> >>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>>>
> >>>>>
> >>
> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
> >>>>>>
> >>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>>>  USER_RATINGS_NEGLECTED=1,798,738
> >>>>>>
> >>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>> USER_RATINGS_USED=12,429,693
> >>>>>>
> >>>>>>
> >>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
> >>>>>>
> >>>>>
> >>
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> >>>>>>
> >>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:     ROWS=3312879
> >>>>>>
> >>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> >>>>>>
> >>>>>
> >>
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> >>>>>>
> >>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:     COOCCURRENCES=35882374
> >>>>>>
> >>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:     PRUNED_COOCCURRENCES=0
> >>>>>>
> >>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map input
> records=3312879
> >>>>>>
> >>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map output
> >> records=17570268
> >>>>>>
> >>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce input
> >>>>> records=5221907
> >>>>>>
> >>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce output
> >>>>> records=3312879
> >>>>>>
> >>>>>>
> >>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
> >>>>> records=3312879
> >>>>>>
> >>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
> >>>>> records=3312879
> >>>>>>
> >>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
> >>>>> records=3312879
> >>>>>>
> >>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
> >>>>> records=3312879
> >>>>>>
> >>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map input
> records=7528530
> >>>>>>
> >>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map output
> >> records=3313251
> >>>>>>
> >>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce input
> >>>>> records=3313251
> >>>>>>
> >>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce output
> >>>>> records=3313251
> >>>>>>
> >>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map input
> records=6626130
> >>>>>>
> >>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map output
> >> records=6626130
> >>>>>>
> >>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce input
> >>>>> records=6626130
> >>>>>>
> >>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce output
> >>>>> records=3312879
> >>>>>>
> >>>>>>
> >>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map input
> records=3312879
> >>>>>>
> >>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map output
> >> records=3313251
> >>>>>>
> >>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce input
> >>>>> records=3313251
> >>>>>>
> >>>>>> --------
> >>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce output records=0
> >>>>>> --------
> >>>>>>
> >>>>>> why 0???
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message