mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <pat.fer...@gmail.com>
Subject Re: recommenditembased returns 0 records from last map-reduce job
Date Sun, 27 Jul 2014 01:18:27 GMT
Both those jobs require you create Mahout IDs for users and items. For most Hadoop based Mahout
jobs, taking either text input or sequence files, the IDs must follow the rules mentioned
below. There are a few exceptions but none you are using. The Wiki was rewritten for 0.9 and
so the ID requirements may not be documented well. You can file a Jira so someone documents
this.

BTW spark-itemsimilarity will take any IDs and can read any text-delimited file format, unfortunately
it’s not quite ready yet.
 
On Jul 26, 2014, at 3:14 AM, Serega Sheypak <serega.sheypak@gmail.com> wrote:

Hm... rather confusing... You are talking about input for:
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
or
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

My target is to get item-item similarity. ItemSimilarityJob right now
returns few similarities.

I'm readin this:
https://mahout.apache.org/users/recommender/intro-itembased-hadoop.html
and that:
https://mahout.apache.org/users/recommender/userbased-5-minutes.html

I don't see there something about " Your IDs must be in the range from 0 to
the number of rows" for both items and users. Where does this requirement
come from?


2014-07-25 23:57 GMT+04:00 Pat Ferrel <pat.ferrel@gmail.com>:

> I think I did explain below. Your IDs must be in the range from 0 to the
> number of rows - 1 and the same for item IDs. This is done by taking your
> application specific IDs and mapping them to sequential non-negative
> Integers. You need to maintain a mapping to/from Mahout IDs somewhere in
> your own code.
> 
> For example imagine input of the form
> -92, abc, 1.0
> 75000x, jkl, 2.0
> 
> Your first user ID is -92, give it Mahout ID = 0. For your next user ID
> 75000x give it Mahout ID = 1
> Your first item ID is abc, give it Mahout ID = 0. For your next item ID
> jkl give it Mahout ID = 1
> keep doing this the first time you see a unique id from your input. A Map
> will do this for you.
> 
> And so on. Then the input to Mahout would be:
> 0,0,1.0
> 1,1,2.0
> 
> The output will have Mahout IDs too so you need to map recommendations for
> Mahout User ID 0 back to your User ID of -92, and the same for all item IDs.
> 
> 
> On Jul 25, 2014, at 11:55 AM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
> 
> I'm preparing data using apache hive: user_id:long, item_it:long,
> preference[1.0, 2.0]
> I don't understand "For most Mahout jobs you have to prepare you data to
> have Mahout IDs". What is "Mahout IDs"? I try to follow mahout site docs, I
> didn't find there something related to mahout ids.
> Please explain.
> 
> 
> 2014-07-25 22:39 GMT+04:00 Pat Ferrel <pat.ferrel@gmail.com>:
> 
>> Sorry I haven’t read this thread carefully but it looks like you may be
>> using the wrong IDs.
>> 
>> For most Mahout jobs you have to prepare you data to have Mahout IDs. You
>> do this by looking at each datum and as you see a new unique application
>> specific user or item ID you give it a Mahout ID starting from 0. So
> Mahout
>> ID can be thought of as row and column numbers in a matrix. The Mahout
> IDs
>> for rows will be 0 thru # of rows-1 same for columns.
>> 
>> This always requires that you translate into Mahout IDs then after the
> job
>> is run translate back into your application IDs. You need a
> bi-directional
>> dictionary of some type. I use a HashBiMap from Guava.
>> 
>> Also I’d avoid the threshold for now. If you get that wrong it will mess
>> things up badly and is very hard to tune. It’s there for completeness
> but I
>> never use it.
>> 
>> 
>> On Jul 25, 2014, at 12:55 AM, Serega Sheypak <serega.sheypak@gmail.com>
>> wrote:
>> 
>> Hi, nothing helps...
>> I do use mahout 0.9 compiled for CDH 4.7
>> I do provide only positive values
>> I do use itemsimilarityJob and do get 2000 similarities for 1400 unique
>> items
>> Input data is:
>> 16*10^6 preferences
>> 4*10^6 users
>> 0.6*10^ items
>> I do use perason correlation and preferece vlaues are: 1.0 and 2.0
>> 
>> 
>> 2014-07-22 9:32 GMT+04:00 Serega Sheypak <serega.sheypak@gmail.com>:
>> 
>>> Ok, I have recompiled mahout 0.9 for CDH 4.7. I'll try this evening.
>>> Right now I don't see how can it help me. As far as I know the stuff I
>> try
>>> to use is pretty old and stable.
>>> looks like I do apply it in a wrong way.
>>> 
>>> There is an option for recommenditembased named "--threshold". I do
>>> provide data for recommenditembased with preference values in range
>>> [1.1..2.0].
>>> I set --threshold to 1.2
>>> --threshold is absolute and can be from [1.1 . .2+] or it's relative and
>>> can be [0.0 .. 0.99999]?
>>> 
>>> 
>>> 2014-07-22 3:54 GMT+04:00 Ted Dunning <ted.dunning@gmail.com>:
>>> 
>>> That version is no longer supported.  You should upgrade to 0.9
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Mon, Jul 21, 2014 at 11:41 AM, Serega Sheypak <
>>>> serega.sheypak@gmail.com>
>>>> wrote:
>>>> 
>>>>> 0.7-cdh4.7.0
>>>>> Anyway, recommenditembased does produce these catalogs:
>>>>> 
>>>>> /recommenditembased/temp/maxValues.bin
>>>>> /recommenditembased/temp/norms.bin
>>>>> /recommenditembased/temp/numNonZeroEntries.bin
>>>>> /recommenditembased/temp/pairwiseSimilarity
>>>>> /recommenditembased/temp/partialMultiply
>>>>> /recommenditembased/temp/prePartialMultiply1
>>>>> /recommenditembased/temp/prePartialMultiply2
>>>>> /recommenditembased/temp/preparePreferenceMatrix
>>>>> /recommenditembased/temp/similarityMatrix
>>>>> /recommenditembased/temp/weights
>>>>> 
>>>>> I suppose that "/recommenditembased/temp/similarityMatrix" is the
> thing
>>>> In
>>>>> eed. Right now I try to read it using
>>>>> 
>>>>> matrix = LOAD '/recommenditembased/temp/similarityMatrix' USING
>>>>> com.twitter.elephantbird.pig.load.SequenceFileLoader(
>>>>>  '-c com.twitter.elephantbird.pig.util.IntWritableConverter',
>>>>>  '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter'
>>>>> )  as (intId: int, vector:tuple(cardinality:int,
>>>>> entries:bag{t:tuple(some_id:long, some_value:double)}));
>>>>> 
>>>>> 
>>>>> Looks like the vector is empty... Or i do something wrong.
>>>>> 
>>>>> 
>>>>> 
>>>>> 2014-07-21 22:09 GMT+04:00 Ted Dunning <ted.dunning@gmail.com>:
>>>>> 
>>>>>> Which version of Mahout?
>>>>>> 
>>>>>> 
>>>>>> On Mon, Jul 21, 2014 at 11:05 AM, Serega Sheypak <
>>>>> serega.sheypak@gmail.com
>>>>>>> 
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi, I've tried: Unexpected --outputPathForSimilarityMatrix while
>>>>>> processing
>>>>>>> Job-Specific
>>>>>>> 
>>>>>>> sudo -u hdfs hadoop fs -rm -r
>>>>>> hdfs://nameservice1/recommenditembased/output
>>>>>>> sudo -u hdfs hadoop fs -rm -r
>>>>> hdfs://nameservice1/recommenditembased/temp
>>>>>>> sudo -u oozie mahout recommenditembased \
>>>>>>>                  --input \
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> hdfs://nameservice1/user/hive/warehouse/staging_weighted_visits_and_rec_clicks
>>>>>>> \
>>>>>>>                  --output \
>>>>>>>                  hdfs://nameservice1/recommenditembased/output
\
>>>>>>>                  --similarityClassname \
>>>>>>>                  SIMILARITY_LOGLIKELIHOOD \
>>>>>>>                 --numRecommendations \
>>>>>>>                  500 \
>>>>>>>                  --booleanData \
>>>>>>>                  false \
>>>>>>>                  --maxPrefsPerUser \
>>>>>>>                  1000 \
>>>>>>>                  --maxSimilaritiesPerItem \
>>>>>>>                  1000 \
>>>>>>>                  --minPrefsPerUser \
>>>>>>>                  5 \
>>>>>>>                  --maxPrefsPerUserInItemSimilarity \
>>>>>>>                  30 \
>>>>>>>                  --threshold \
>>>>>>>                 1.1 \
>>>>>>>                  --tempDir \
>>>>>>>                  hdfs://nameservice1/recommenditembased/temp
\
>>>>>>>                  --outputPathForSimilarityMatrix \
>>>>>>> 
>>>> hdfs://nameservice1/recommenditembased/sim_matrix
>>>>>>> 
>>>>>>> 
>>>>>>> I'm on Cloudera cdh 4.7, looks like this feature is not supported.
>>>>>>> 
>>>>>>> 
>>>>>>> 2014-07-21 11:18 GMT+04:00 Peng Zhang <pzhang.xjtu@gmail.com>:
>>>>>>> 
>>>>>>>> Serega,
>>>>>>>> 
>>>>>>>> See the last line on how to pass outputPathForSimilarityMatrix
>>>>> options
>>>>>> to
>>>>>>>> the recommenditembased command:
>>>>>>>> 
>>>>>>>> sudo -u oozie mahout recommenditembased \
>>>>>>>>                 --input visited_items_with_inverted_items
\
>>>>>>>> 
>>>>>>>>                 --output result \
>>>>>>>>                 --similarityClassname SIMILARITY_LOGLIKELIHOOD
>>>> \
>>>>>>>>                 --usersFile inverted_items \
>>>>>>>>                 --numRecommendations 500 \
>>>>>>>>                 --booleanData false \
>>>>>>>>                 --maxPrefsPerUser 100 \
>>>>>>>>                 --maxSimilaritiesPerItem 500 \
>>>>>>>>                 --minPrefsPerUser 0\
>>>>>>>>                 --maxPrefsPerUserInItemSimilarity 30 \
>>>>>>>>                 --threshold 0.91 \
>>>>>>>>                 --tempDir  temp \
>>>>>>>>                 --outputPathForSimilarityMatrix
>>>> similarityMatri \
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Peng Zhang
>>>>>>>> pzhang.xjtu@gmail.com
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Jul 21, 2014, at 3:09 PM, Serega Sheypak <
>>>>> serega.sheypak@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I've inspected the code, our approach wouldn't work with
>>>>>>>> booleanData=false.
>>>>>>>>> We do calcualte imte similarity in the wrong way...(((
>>>>>>>>> Thank you
>>>>>>>>> 1. We provide "fake" user_id and provide --usersFile
in order to
>>>>> get
>>>>>>>>> recommendations for "fake user_id, where user_id is a
negative
>>>>>> item_id.
>>>>>>>> It
>>>>>>>>> worked when we did provide user_id->item_id pairs
without
>>>>> preference.
>>>>>>>>> 2. Our target is to get item similarities. We tried
>>>>>>>>> 
>>>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
>>>>>> but
>>>>>>>> it
>>>>>>>>> returns bad result comparing to RecommenderJob with our
"fake"
>>>>>> user_id
>>>>>>>>> (inverted item_id)
>>>>>>>>> 
>>>>>>>>> 1. I'll try the option you provided.
>>>>>>>>> 2. I will remove input with fake user_id and usersFile
with
>>>> these
>>>>>> fake
>>>>>>>> ids
>>>>>>>>> 
>>>>>>>>> 3.
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
>>>>>>>>> I don't understand how to pass ---outputPathForSimilarityMatrix
>>>>>> option
>>>>>>> to
>>>>>>>>> RecommenderJob
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2014-07-21 4:58 GMT+04:00 Peng Zhang <pzhang.xjtu@gmail.com>:
>>>>>>>>> 
>>>>>>>>>> Seraga,
>>>>>>>>>> 
>>>>>>>>>> I have two comments:
>>>>>>>>>> 1. Don’t use negative user ids. Since Mahout uses
user id as
>>>> well
>>>>> as
>>>>>>>> item
>>>>>>>>>> id as the row/column index, you’d better use 0,
1, 2, etc as
>>>> ids
>>>>>>>>>> 2. If you want to get the item similarity information,
you can
>>>> use
>>>>>>>>>> --outputPathForSimilarityMatrix in the command
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> Peng Zhang
>>>>>>>>>> M: +86 186-1658-7856
>>>>>>>>>> pzhang.xjtu@gmail.com
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Jul 21, 2014, at 4:00 AM, Serega Sheypak <
>>>>>> serega.sheypak@gmail.com
>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> All bad things happen here:
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Name
>>>>>>>>>>> 
>>>>>>>>>>> RecommenderJob-PartialMultiplyMapper-Reducer
>>>>>>>>>>> 
>>>>>>>>>>> User
>>>>>>>>>>> 
>>>>>>>>>>> oozie
>>>>>>>>>>> 
>>>>>>>>>>> Process User
>>>>>>>>>>> 
>>>>>>>>>>> oozie
>>>>>>>>>>> 
>>>>>>>>>>> Group
>>>>>>>>>>> 
>>>>>>>>>>> oozie
>>>>>>>>>>> 
>>>>>>>>>>> Mapper Class
>>>>>>>>>>> 
>>>>>>>>>>> PartialMultiplyMapper
>>>>>>>>>>> 
>>>>>>>>>>> Reducer Class
>>>>>>>>>>> 
>>>>>>>>>>> AggregateAndRecommendReducer
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Job Input Directory
>>>>>>>>>>> 
>>>>>>>>>>> hdfs://nameservice1/itemrec/temp/partialMultiply
>>>>>>>>>>> 
>>>>>>>>>>> Job Output Directory
>>>>>>>>>>> 
>>>>>>>>>>> hdfs://nameservice1/itemrec/output/
>>>>>>>>>>> 
>>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:    
Map input
>>>>>>> records=3312879
>>>>>>>>>>> 
>>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:    
Map output
>>>>>>> records=3313251
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:    
Reduce input
>>>>>>>> records=3313251
>>>>>>>>>>> 
>>>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:    
Reduce output
>>>>>> records=0
>>>>>>>>>>> 
>>>>>>>>>>> Why does mahout returns 0 rows? it works when
booleanData=true
>>>>>>>>>> (preferences
>>>>>>>>>>> are ignored...?)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 2014-07-20 23:19 GMT+04:00 Serega Sheypak <
>>>>>> serega.sheypak@gmail.com
>>>>>>>> :
>>>>>>>>>>> 
>>>>>>>>>>>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
>>>>>>>>>>>> users_file:
>>>>>>>>>>>> --inverted_item_id
>>>>>>>>>>>> -1
>>>>>>>>>>>> -2
>>>>>>>>>>>> -3
>>>>>>>>>>>> -4
>>>>>>>>>>>> 
>>>>>>>>>>>> users_items_prefs
>>>>>>>>>>>> --inverted item_id
>>>>>>>>>>>> -1 1 1.0
>>>>>>>>>>>> -2 2 1.0
>>>>>>>>>>>> -3 3 1.0
>>>>>>>>>>>> -4 4 1.0
>>>>>>>>>>>> --user_id item_id pref_value
>>>>>>>>>>>> 11   1 1.6
>>>>>>>>>>>> 11   2 1.6
>>>>>>>>>>>> 123 3 2.0
>>>>>>>>>>>> 123 4 2.0
>>>>>>>>>>>> 333 1 2.0
>>>>>>>>>>>> 333 2 1.6
>>>>>>>>>>>> --e.t.c.
>>>>>>>>>>>> 
>>>>>>>>>>>> if I set --booleanData true
>>>>>>>>>>>> then mahout returns the result.
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 2014-07-20 23:12 GMT+04:00 Andrew Musselman
<
>>>>>>>> andrew.musselman@gmail.com
>>>>>>>>>>> :
>>>>>>>>>>>> 
>>>>>>>>>>>> I'm confused about how you're constructing
the user file, and
>>>>> why
>>>>>>>> there
>>>>>>>>>>>>> are negated item ids here.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Can you post some more details please,
including Mahout
>>>> version
>>>>>> and
>>>>>>>>>> some
>>>>>>>>>>>>> sample data sets?
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Jul 20, 2014, at 11:57 AM, Serega
Sheypak <
>>>>>>>>>> serega.sheypak@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi, I'm trying to create item similarity.
>>>>>>>>>>>>>> I gather items which users visit
during shopping and then
>>>>>> create a
>>>>>>>>>> file:
>>>>>>>>>>>>>> user_id, item_id, weight (where weight
can be: [1.0, 1.6,
>>>>> 1.9],
>>>>>>>>>> depends
>>>>>>>>>>>>> on
>>>>>>>>>>>>>> user action type and data source)
>>>>>>>>>>>>>> UNION
>>>>>>>>>>>>>> -item_id, item_id, 1.0 (from items
dictionary)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> and I do provide a userFile, where
user_id = -item_id
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The idea is to get item similary.
If any user visits item
>>>>> named
>>>>>>>> "A", i
>>>>>>>>>>>>> want
>>>>>>>>>>>>>> to show him items "B", "c", "xxx"
using preferences of
>>>> other
>>>>>>> users.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The problem is that the last (???)
mapreduce job returns 0
>>>>> rows:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Here are my settings:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> sudo -u oozie mahout recommenditembased
\
>>>>>>>>>>>>>>               --input visited_items_with_inverted_items
>>>> \
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>               --output result \
>>>>>>>>>>>>>>               --similarityClassname
>>>>> SIMILARITY_LOGLIKELIHOOD
>>>>>> \
>>>>>>>>>>>>>>               --usersFile inverted_items
\
>>>>>>>>>>>>>>               --numRecommendations
500 \
>>>>>>>>>>>>>>               --booleanData false
\
>>>>>>>>>>>>>>               --maxPrefsPerUser 100
\
>>>>>>>>>>>>>>               --maxSimilaritiesPerItem
500 \
>>>>>>>>>>>>>>               --minPrefsPerUser 0\
>>>>>>>>>>>>>>               --maxPrefsPerUserInItemSimilarity
30 \
>>>>>>>>>>>>>>               --threshold 0.91 \
>>>>>>>>>>>>>>               --tempDir  temp \
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Some counters... I don't get what
do they mean....
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
>>>>>>>>>>>>>> 
>>>>>>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
    USERS=7528530
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>>>>>>>>>>>> USER_RATINGS_NEGLECTED=1,798,738
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
>>>>>>>>>>>>> USER_RATINGS_USED=12,429,693
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
    ROWS=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>> 
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>>>>>>> COOCCURRENCES=35882374
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
>>>>>>> PRUNED_COOCCURRENCES=0
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Map input
>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Map output
>>>>>>>>>> records=17570268
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Reduce input
>>>>>>>>>>>>> records=5221907
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Reduce output
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce input
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce output
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce input
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce output
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Map input
>>>>>>>> records=7528530
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Map output
>>>>>>>>>> records=3313251
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Reduce input
>>>>>>>>>>>>> records=3313251
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Reduce output
>>>>>>>>>>>>> records=3313251
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Map input
>>>>>>>> records=6626130
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Map output
>>>>>>>>>> records=6626130
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Reduce input
>>>>>>>>>>>>> records=6626130
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Reduce output
>>>>>>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Map input
>>>>>>>> records=3312879
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Map output
>>>>>>>>>> records=3313251
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Reduce input
>>>>>>>>>>>>> records=3313251
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --------
>>>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Reduce output
>>>>>>> records=0
>>>>>>>>>>>>>> --------
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> why 0???
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 


Mime
View raw message