mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: recommenditembased returns 0 records from last map-reduce job
Date Sun, 20 Jul 2014 20:00:34 GMT
All bad things happen here:



 Name

RecommenderJob-PartialMultiplyMapper-Reducer

User

oozie

Process User

oozie

Group

oozie

Mapper Class

PartialMultiplyMapper

Reducer Class

AggregateAndRecommendReducer


Job Input Directory

hdfs://nameservice1/itemrec/temp/partialMultiply

Job Output Directory

hdfs://nameservice1/itemrec/output/

14/07/20 23:57:47 INFO mapred.JobClient:     Map input records=3312879

14/07/20 23:57:47 INFO mapred.JobClient:     Map output records=3313251


14/07/20 23:57:47 INFO mapred.JobClient:     Reduce input records=3313251

14/07/20 23:57:47 INFO mapred.JobClient:     Reduce output records=0

Why does mahout returns 0 rows? it works when booleanData=true (preferences
are ignored...?)



2014-07-20 23:19 GMT+04:00 Serega Sheypak <serega.sheypak@gmail.com>:

> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
> users_file:
> --inverted_item_id
> -1
> -2
> -3
> -4
>
> users_items_prefs
> --inverted item_id
> -1 1 1.0
> -2 2 1.0
> -3 3 1.0
> -4 4 1.0
> --user_id item_id pref_value
> 11   1 1.6
> 11   2 1.6
> 123 3 2.0
> 123 4 2.0
> 333 1 2.0
> 333 2 1.6
> --e.t.c.
>
> if I set --booleanData true
> then mahout returns the result.
>
>
>
>
> 2014-07-20 23:12 GMT+04:00 Andrew Musselman <andrew.musselman@gmail.com>:
>
> I'm confused about how you're constructing the user file, and why there
>> are negated item ids here.
>>
>> Can you post some more details please, including Mahout version and some
>> sample data sets?
>>
>> > On Jul 20, 2014, at 11:57 AM, Serega Sheypak <serega.sheypak@gmail.com>
>> wrote:
>> >
>> > Hi, I'm trying to create item similarity.
>> > I gather items which users visit during shopping and then create a file:
>> > user_id, item_id, weight (where weight can be: [1.0, 1.6, 1.9], depends
>> on
>> > user action type and data source)
>> > UNION
>> > -item_id, item_id, 1.0 (from items dictionary)
>> >
>> > and I do provide a userFile, where user_id = -item_id
>> >
>> > The idea is to get item similary. If any user visits item named "A", i
>> want
>> > to show him items "B", "c", "xxx" using preferences of other users.
>> >
>> > The problem is that the last (???) mapreduce job returns 0 rows:
>> >
>> > Here are my settings:
>> >
>> >
>> > sudo -u oozie mahout recommenditembased \
>> >                    --input visited_items_with_inverted_items \
>> >
>> >                    --output result \
>> >                    --similarityClassname SIMILARITY_LOGLIKELIHOOD \
>> >                    --usersFile inverted_items \
>> >                    --numRecommendations 500 \
>> >                    --booleanData false \
>> >                    --maxPrefsPerUser 100 \
>> >                    --maxSimilaritiesPerItem 500 \
>> >                    --minPrefsPerUser 0\
>> >                    --maxPrefsPerUserInItemSimilarity 30 \
>> >                    --threshold 0.91 \
>> >                    --tempDir  temp \
>> >
>> > Some counters... I don't get what do they mean....
>> >
>> > 14/07/20 22:43:08 INFO mapred.JobClient:
>> >  org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
>> >
>> > 14/07/20 22:43:08 INFO mapred.JobClient:     USERS=7528530
>> >
>> > 14/07/20 22:43:43 INFO mapred.JobClient:
>> >
>>  org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
>> >
>> > 14/07/20 22:43:43 INFO mapred.JobClient:
>> >    USER_RATINGS_NEGLECTED=1,798,738
>> >
>> > 14/07/20 22:43:43 INFO mapred.JobClient:
>> USER_RATINGS_USED=12,429,693
>> >
>> >
>> > 14/07/20 22:44:24 INFO mapred.JobClient:
>> >
>>  org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>> >
>> > 14/07/20 22:44:24 INFO mapred.JobClient:     ROWS=3312879
>> >
>> > 14/07/20 22:45:18 INFO mapred.JobClient:
>> >
>>  org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>> >
>> > 14/07/20 22:45:18 INFO mapred.JobClient:     COOCCURRENCES=35882374
>> >
>> > 14/07/20 22:45:18 INFO mapred.JobClient:     PRUNED_COOCCURRENCES=0
>> >
>> > 14/07/20 22:46:00 INFO mapred.JobClient:     Map input records=3312879
>> >
>> > 14/07/20 22:46:00 INFO mapred.JobClient:     Map output records=17570268
>> >
>> > 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce input
>> records=5221907
>> >
>> > 14/07/20 22:46:00 INFO mapred.JobClient:     Reduce output
>> records=3312879
>> >
>> >
>> > 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
>> records=3312879
>> >
>> > 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
>> records=3312879
>> >
>> > 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce input
>> records=3312879
>> >
>> > 14/07/20 22:46:34 INFO mapred.JobClient:     Reduce output
>> records=3312879
>> >
>> > 14/07/20 22:47:06 INFO mapred.JobClient:     Map input records=7528530
>> >
>> > 14/07/20 22:47:06 INFO mapred.JobClient:     Map output records=3313251
>> >
>> > 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce input
>> records=3313251
>> >
>> > 14/07/20 22:47:06 INFO mapred.JobClient:     Reduce output
>> records=3313251
>> >
>> > 14/07/20 22:47:40 INFO mapred.JobClient:     Map input records=6626130
>> >
>> > 14/07/20 22:47:40 INFO mapred.JobClient:     Map output records=6626130
>> >
>> > 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce input
>> records=6626130
>> >
>> > 14/07/20 22:47:40 INFO mapred.JobClient:     Reduce output
>> records=3312879
>> >
>> >
>> > 14/07/20 22:48:26 INFO mapred.JobClient:     Map input records=3312879
>> >
>> > 14/07/20 22:48:26 INFO mapred.JobClient:     Map output records=3313251
>> >
>> > 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce input
>> records=3313251
>> >
>> > --------
>> > 14/07/20 22:48:26 INFO mapred.JobClient:     Reduce output records=0
>> > --------
>> >
>> > why 0???
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message