mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peng Zhang <pzhang.x...@gmail.com>
Subject Re: recommenditembased returns 0 records from last map-reduce job
Date Mon, 21 Jul 2014 07:18:32 GMT
Serega,

See the last line on how to pass outputPathForSimilarityMatrix options to the recommenditembased
command:

sudo -u oozie mahout recommenditembased \
                   --input visited_items_with_inverted_items \

                   --output result \
                   --similarityClassname SIMILARITY_LOGLIKELIHOOD \
                   --usersFile inverted_items \
                   --numRecommendations 500 \
                   --booleanData false \
                   --maxPrefsPerUser 100 \
                   --maxSimilaritiesPerItem 500 \
                   --minPrefsPerUser 0\
                   --maxPrefsPerUserInItemSimilarity 30 \
                   --threshold 0.91 \
                   --tempDir  temp \
                   --outputPathForSimilarityMatrix similarityMatri \


Peng Zhang
pzhang.xjtu@gmail.com





On Jul 21, 2014, at 3:09 PM, Serega Sheypak <serega.sheypak@gmail.com> wrote:

> I've inspected the code, our approach wouldn't work with booleanData=false.
> We do calcualte imte similarity in the wrong way...(((
> Thank you
> 1. We provide "fake" user_id and provide --usersFile in order to get
> recommendations for "fake user_id, where user_id is a negative item_id. It
> worked when we did provide user_id->item_id pairs without preference.
> 2. Our target is to get item similarities. We tried
> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob but it
> returns bad result comparing to RecommenderJob with our "fake" user_id
> (inverted item_id)
> 
> 1. I'll try the option you provided.
> 2. I will remove input with fake user_id and usersFile with these fake ids
> 
> 3.
> https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
> I don't understand how to pass ---outputPathForSimilarityMatrix option to
> RecommenderJob
> 
> 
> 2014-07-21 4:58 GMT+04:00 Peng Zhang <pzhang.xjtu@gmail.com>:
> 
>> Seraga,
>> 
>> I have two comments:
>> 1. Don’t use negative user ids. Since Mahout uses user id as well as item
>> id as the row/column index, you’d better use 0, 1, 2, etc as ids
>> 2. If you want to get the item similarity information, you can use
>> --outputPathForSimilarityMatrix in the command
>> 
>> Regards,
>> Peng Zhang
>> M: +86 186-1658-7856
>> pzhang.xjtu@gmail.com
>> 
>> 
>> 
>> 
>> 
>> On Jul 21, 2014, at 4:00 AM, Serega Sheypak <serega.sheypak@gmail.com>
>> wrote:
>> 
>>> All bad things happen here:
>>> 
>>> 
>>> 
>>> Name
>>> 
>>> RecommenderJob-PartialMultiplyMapper-Reducer
>>> 
>>> User
>>> 
>>> oozie
>>> 
>>> Process User
>>> 
>>> oozie
>>> 
>>> Group
>>> 
>>> oozie
>>> 
>>> Mapper Class
>>> 
>>> PartialMultiplyMapper
>>> 
>>> Reducer Class
>>> 
>>> AggregateAndRecommendReducer
>>> 
>>> 
>>> Job Input Directory
>>> 
>>> hdfs://nameservice1/itemrec/temp/partialMultiply
>>> 
>>> Job Output Directory
>>> 
>>> hdfs://nameservice1/itemrec/output/
>>> 
>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Map input records=3312879
>>> 
>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Map output records=3313251
>>> 
>>> 
>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce input records=3313251
>>> 
>>> 14/07/20 23:57:47 INFO mapred.JobClient:     Reduce output records=0
>>> 
>>> Why does mahout returns 0 rows? it works when booleanData=true
>> (preferences
>>> are ignored...?)
>>> 
>>> 
>>> 
>>> 2014-07-20 23:19 GMT+04:00 Serega Sheypak <serega.sheypak@gmail.com>:
>>> 
>>>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
>>>> users_file:
>>>> --inverted_item_id
>>>> -1
>>>> -2
>>>> -3
>>>> -4
>>>> 
>>>> users_items_prefs
>>>> --inverted item_id
>>>> -1 1 1.0
>>>> -2 2 1.0
>>>> -3 3 1.0
>>>> -4 4 1.0
>>>> --user_id item_id pref_value
>>>> 11   1 1.6
>>>> 11   2 1.6
>>>> 123 3 2.0
>>>> 123 4 2.0
>>>> 333 1 2.0
>>>> 333 2 1.6
>>>> --e.t.c.
>>>> 
>>>> if I set --booleanData true
>>>> then mahout returns the result.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 2014-07-20 23:12 GMT+04:00 Andrew Musselman <andrew.musselman@gmail.com
>>> :
>>>> 
>>>> I'm confused about how you're constructing the user file, and why there
>>>>> are negated item ids here.
>>>>> 
>>>>> Can you post some more details please, including Mahout version and
>> some
>>>>> sample data sets?
>>>>> 
>>>>>> On Jul 20, 2014, at 11:57 AM, Serega Sheypak <
>> serega.sheypak@gmail.com>
>>>>> wrote:
>>>>>> 
>>>>>> Hi, I'm trying to create item similarity.
>>>>>> I gather items which users visit during shopping and then create
a
>> file:
>>>>>> user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9],
>> depends
>>>>> on
>>>>>> user action type and data source)
>>>>>> UNION
>>>>>> -item_id, item_id, 1.0 (from items dictionary)
>>>>>> 
>>>>>> and I do provide a userFile, where user_id = -item_id
>>>>>> 
>>>>>> The idea is to get item similary. If any user visits item named "A",
i
>>>>> want
>>>>>> to show him items "B", "c", "xxx" using preferences of other users.
>>>>>> 
>>>>>> The problem is that the last (???) mapreduce job returns 0 rows:
>>>>>> 
>>>>>> Here are my settings:
>>>>>> 
>>>>>> 
>>>>>> sudo -u oozie mahout recommenditembased \
>>>>>>                  --input visited_items_with_inverted_items \
>>>>>> 
>>>>>>                  --output result \
>>>>>>                  --similarityClassname SIMILARITY_LOGLIKELIHOOD \
>>>>>>                  --usersFile inverted_items \
>>>>>>                  --numRecommendations 500 \
>>>>>>                  --booleanData false \
>>>>>>                  --maxPrefsPerUser 100 \
>>>>>>                  --maxSimilaritiesPerItem 500 \
>>>>>>                  --minPrefsPerUser 0\
>>>>>>                  --maxPrefsPerUserInItemSimilarity 30 \
>>>>>>                  --threshold 0.91 \
>>>>>>                  --tempDir  temp \
>>>>>> 
>>>>>> Some counters... I don't get what do they mean....
>>>>>> 
>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
>>>>>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
>>>>>> 
>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:     USERS=7528530
>>>>>> 
>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>>>> 
>>>>> 
>> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
>>>>>> 
>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>>>>  USER_RATINGS_NEGLECTED=1,798,738
>>>>>> 
>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>>> USER_RATINGS_USED=12,429,693
>>>>>> 
>>>>>> 
>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
>>>>>> 
>>>>> 
>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>>>>>> 
>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:     ROWS=3312879
>>>>>> 
>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>>>>>> 
>>>>> 
>> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>>>>>> 
>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:     COOCCURRENCES=35882374
>>>>>> 
>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:     PRUNED_COOCCURRENCES=0
>>>>>> 
>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map input records=3312879
>>>>>> 
>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Map output
>> records=17570268
>>>>>> 
>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce input
>>>>> records=5221907
>>>>>> 
>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce output
>>>>> records=3312879
>>>>>> 
>>>>>> 
>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
>>>>> records=3312879
>>>>>> 
>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
>>>>> records=3312879
>>>>>> 
>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
>>>>> records=3312879
>>>>>> 
>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
>>>>> records=3312879
>>>>>> 
>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map input records=7528530
>>>>>> 
>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Map output
>> records=3313251
>>>>>> 
>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce input
>>>>> records=3313251
>>>>>> 
>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce output
>>>>> records=3313251
>>>>>> 
>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map input records=6626130
>>>>>> 
>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Map output
>> records=6626130
>>>>>> 
>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce input
>>>>> records=6626130
>>>>>> 
>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce output
>>>>> records=3312879
>>>>>> 
>>>>>> 
>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map input records=3312879
>>>>>> 
>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Map output
>> records=3313251
>>>>>> 
>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce input
>>>>> records=3313251
>>>>>> 
>>>>>> --------
>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce output records=0
>>>>>> --------
>>>>>> 
>>>>>> why 0???
>>>>> 
>>>> 
>>>> 
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message