mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Serega Sheypak <serega.shey...@gmail.com>
Subject Re: recommenditembased returns 0 records from last map-reduce job
Date Sat, 26 Jul 2014 10:14:23 GMT
Hm... rather confusing... You are talking about input for:
org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
or
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob

My target is to get item-item similarity. ItemSimilarityJob right now
returns few similarities.

I'm readin this:
https://mahout.apache.org/users/recommender/intro-itembased-hadoop.html
and that:
https://mahout.apache.org/users/recommender/userbased-5-minutes.html

I don't see there something about " Your IDs must be in the range from 0 to
the number of rows" for both items and users. Where does this requirement
come from?


2014-07-25 23:57 GMT+04:00 Pat Ferrel <pat.ferrel@gmail.com>:

> I think I did explain below. Your IDs must be in the range from 0 to the
> number of rows - 1 and the same for item IDs. This is done by taking your
> application specific IDs and mapping them to sequential non-negative
> Integers. You need to maintain a mapping to/from Mahout IDs somewhere in
> your own code.
>
> For example imagine input of the form
> -92, abc, 1.0
> 75000x, jkl, 2.0
>
> Your first user ID is -92, give it Mahout ID = 0. For your next user ID
> 75000x give it Mahout ID = 1
> Your first item ID is abc, give it Mahout ID = 0. For your next item ID
> jkl give it Mahout ID = 1
> keep doing this the first time you see a unique id from your input. A Map
> will do this for you.
>
> And so on. Then the input to Mahout would be:
> 0,0,1.0
> 1,1,2.0
>
> The output will have Mahout IDs too so you need to map recommendations for
> Mahout User ID 0 back to your User ID of -92, and the same for all item IDs.
>
>
> On Jul 25, 2014, at 11:55 AM, Serega Sheypak <serega.sheypak@gmail.com>
> wrote:
>
> I'm preparing data using apache hive: user_id:long, item_it:long,
> preference[1.0, 2.0]
> I don't understand "For most Mahout jobs you have to prepare you data to
> have Mahout IDs". What is "Mahout IDs"? I try to follow mahout site docs, I
> didn't find there something related to mahout ids.
> Please explain.
>
>
> 2014-07-25 22:39 GMT+04:00 Pat Ferrel <pat.ferrel@gmail.com>:
>
> > Sorry I haven’t read this thread carefully but it looks like you may be
> > using the wrong IDs.
> >
> > For most Mahout jobs you have to prepare you data to have Mahout IDs. You
> > do this by looking at each datum and as you see a new unique application
> > specific user or item ID you give it a Mahout ID starting from 0. So
> Mahout
> > ID can be thought of as row and column numbers in a matrix. The Mahout
> IDs
> > for rows will be 0 thru # of rows-1 same for columns.
> >
> > This always requires that you translate into Mahout IDs then after the
> job
> > is run translate back into your application IDs. You need a
> bi-directional
> > dictionary of some type. I use a HashBiMap from Guava.
> >
> > Also I’d avoid the threshold for now. If you get that wrong it will mess
> > things up badly and is very hard to tune. It’s there for completeness
> but I
> > never use it.
> >
> >
> > On Jul 25, 2014, at 12:55 AM, Serega Sheypak <serega.sheypak@gmail.com>
> > wrote:
> >
> > Hi, nothing helps...
> > I do use mahout 0.9 compiled for CDH 4.7
> > I do provide only positive values
> > I do use itemsimilarityJob and do get 2000 similarities for 1400 unique
> > items
> > Input data is:
> > 16*10^6 preferences
> > 4*10^6 users
> > 0.6*10^ items
> > I do use perason correlation and preferece vlaues are: 1.0 and 2.0
> >
> >
> > 2014-07-22 9:32 GMT+04:00 Serega Sheypak <serega.sheypak@gmail.com>:
> >
> >> Ok, I have recompiled mahout 0.9 for CDH 4.7. I'll try this evening.
> >> Right now I don't see how can it help me. As far as I know the stuff I
> > try
> >> to use is pretty old and stable.
> >> looks like I do apply it in a wrong way.
> >>
> >> There is an option for recommenditembased named "--threshold". I do
> >> provide data for recommenditembased with preference values in range
> >> [1.1..2.0].
> >> I set --threshold to 1.2
> >> --threshold is absolute and can be from [1.1 . .2+] or it's relative and
> >> can be [0.0 .. 0.99999]?
> >>
> >>
> >> 2014-07-22 3:54 GMT+04:00 Ted Dunning <ted.dunning@gmail.com>:
> >>
> >> That version is no longer supported.  You should upgrade to 0.9
> >>>
> >>>
> >>>
> >>>
> >>> On Mon, Jul 21, 2014 at 11:41 AM, Serega Sheypak <
> >>> serega.sheypak@gmail.com>
> >>> wrote:
> >>>
> >>>> 0.7-cdh4.7.0
> >>>> Anyway, recommenditembased does produce these catalogs:
> >>>>
> >>>> /recommenditembased/temp/maxValues.bin
> >>>> /recommenditembased/temp/norms.bin
> >>>> /recommenditembased/temp/numNonZeroEntries.bin
> >>>> /recommenditembased/temp/pairwiseSimilarity
> >>>> /recommenditembased/temp/partialMultiply
> >>>> /recommenditembased/temp/prePartialMultiply1
> >>>> /recommenditembased/temp/prePartialMultiply2
> >>>> /recommenditembased/temp/preparePreferenceMatrix
> >>>> /recommenditembased/temp/similarityMatrix
> >>>> /recommenditembased/temp/weights
> >>>>
> >>>> I suppose that "/recommenditembased/temp/similarityMatrix" is the
> thing
> >>> In
> >>>> eed. Right now I try to read it using
> >>>>
> >>>> matrix = LOAD '/recommenditembased/temp/similarityMatrix' USING
> >>>> com.twitter.elephantbird.pig.load.SequenceFileLoader(
> >>>>   '-c com.twitter.elephantbird.pig.util.IntWritableConverter',
> >>>>   '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter'
> >>>> )  as (intId: int, vector:tuple(cardinality:int,
> >>>> entries:bag{t:tuple(some_id:long, some_value:double)}));
> >>>>
> >>>>
> >>>> Looks like the vector is empty... Or i do something wrong.
> >>>>
> >>>>
> >>>>
> >>>> 2014-07-21 22:09 GMT+04:00 Ted Dunning <ted.dunning@gmail.com>:
> >>>>
> >>>>> Which version of Mahout?
> >>>>>
> >>>>>
> >>>>> On Mon, Jul 21, 2014 at 11:05 AM, Serega Sheypak <
> >>>> serega.sheypak@gmail.com
> >>>>>>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi, I've tried: Unexpected --outputPathForSimilarityMatrix while
> >>>>> processing
> >>>>>> Job-Specific
> >>>>>>
> >>>>>> sudo -u hdfs hadoop fs -rm -r
> >>>>> hdfs://nameservice1/recommenditembased/output
> >>>>>> sudo -u hdfs hadoop fs -rm -r
> >>>> hdfs://nameservice1/recommenditembased/temp
> >>>>>> sudo -u oozie mahout recommenditembased \
> >>>>>>                   --input \
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >
> hdfs://nameservice1/user/hive/warehouse/staging_weighted_visits_and_rec_clicks
> >>>>>> \
> >>>>>>                   --output \
> >>>>>>                   hdfs://nameservice1/recommenditembased/output
\
> >>>>>>                   --similarityClassname \
> >>>>>>                   SIMILARITY_LOGLIKELIHOOD \
> >>>>>>                  --numRecommendations \
> >>>>>>                   500 \
> >>>>>>                   --booleanData \
> >>>>>>                   false \
> >>>>>>                   --maxPrefsPerUser \
> >>>>>>                   1000 \
> >>>>>>                   --maxSimilaritiesPerItem \
> >>>>>>                   1000 \
> >>>>>>                   --minPrefsPerUser \
> >>>>>>                   5 \
> >>>>>>                   --maxPrefsPerUserInItemSimilarity \
> >>>>>>                   30 \
> >>>>>>                   --threshold \
> >>>>>>                  1.1 \
> >>>>>>                   --tempDir \
> >>>>>>                   hdfs://nameservice1/recommenditembased/temp
\
> >>>>>>                   --outputPathForSimilarityMatrix \
> >>>>>>
> >>> hdfs://nameservice1/recommenditembased/sim_matrix
> >>>>>>
> >>>>>>
> >>>>>> I'm on Cloudera cdh 4.7, looks like this feature is not supported.
> >>>>>>
> >>>>>>
> >>>>>> 2014-07-21 11:18 GMT+04:00 Peng Zhang <pzhang.xjtu@gmail.com>:
> >>>>>>
> >>>>>>> Serega,
> >>>>>>>
> >>>>>>> See the last line on how to pass outputPathForSimilarityMatrix
> >>>> options
> >>>>> to
> >>>>>>> the recommenditembased command:
> >>>>>>>
> >>>>>>> sudo -u oozie mahout recommenditembased \
> >>>>>>>                  --input visited_items_with_inverted_items
\
> >>>>>>>
> >>>>>>>                  --output result \
> >>>>>>>                  --similarityClassname SIMILARITY_LOGLIKELIHOOD
> >>> \
> >>>>>>>                  --usersFile inverted_items \
> >>>>>>>                  --numRecommendations 500 \
> >>>>>>>                  --booleanData false \
> >>>>>>>                  --maxPrefsPerUser 100 \
> >>>>>>>                  --maxSimilaritiesPerItem 500 \
> >>>>>>>                  --minPrefsPerUser 0\
> >>>>>>>                  --maxPrefsPerUserInItemSimilarity 30 \
> >>>>>>>                  --threshold 0.91 \
> >>>>>>>                  --tempDir  temp \
> >>>>>>>                  --outputPathForSimilarityMatrix
> >>> similarityMatri \
> >>>>>>>
> >>>>>>>
> >>>>>>> Peng Zhang
> >>>>>>> pzhang.xjtu@gmail.com
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On Jul 21, 2014, at 3:09 PM, Serega Sheypak <
> >>>> serega.sheypak@gmail.com>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> I've inspected the code, our approach wouldn't work
with
> >>>>>>> booleanData=false.
> >>>>>>>> We do calcualte imte similarity in the wrong way...(((
> >>>>>>>> Thank you
> >>>>>>>> 1. We provide "fake" user_id and provide --usersFile
in order to
> >>>> get
> >>>>>>>> recommendations for "fake user_id, where user_id is
a negative
> >>>>> item_id.
> >>>>>>> It
> >>>>>>>> worked when we did provide user_id->item_id pairs
without
> >>>> preference.
> >>>>>>>> 2. Our target is to get item similarities. We tried
> >>>>>>>>
> >>> org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
> >>>>> but
> >>>>>>> it
> >>>>>>>> returns bad result comparing to RecommenderJob with
our "fake"
> >>>>> user_id
> >>>>>>>> (inverted item_id)
> >>>>>>>>
> >>>>>>>> 1. I'll try the option you provided.
> >>>>>>>> 2. I will remove input with fake user_id and usersFile
with
> >>> these
> >>>>> fake
> >>>>>>> ids
> >>>>>>>>
> >>>>>>>> 3.
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >
> https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item/RecommenderJob.java
> >>>>>>>> I don't understand how to pass ---outputPathForSimilarityMatrix
> >>>>> option
> >>>>>> to
> >>>>>>>> RecommenderJob
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 2014-07-21 4:58 GMT+04:00 Peng Zhang <pzhang.xjtu@gmail.com>:
> >>>>>>>>
> >>>>>>>>> Seraga,
> >>>>>>>>>
> >>>>>>>>> I have two comments:
> >>>>>>>>> 1. Don’t use negative user ids. Since Mahout uses
user id as
> >>> well
> >>>> as
> >>>>>>> item
> >>>>>>>>> id as the row/column index, you’d better use 0,
1, 2, etc as
> >>> ids
> >>>>>>>>> 2. If you want to get the item similarity information,
you can
> >>> use
> >>>>>>>>> --outputPathForSimilarityMatrix in the command
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Peng Zhang
> >>>>>>>>> M: +86 186-1658-7856
> >>>>>>>>> pzhang.xjtu@gmail.com
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Jul 21, 2014, at 4:00 AM, Serega Sheypak <
> >>>>> serega.sheypak@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> All bad things happen here:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Name
> >>>>>>>>>>
> >>>>>>>>>> RecommenderJob-PartialMultiplyMapper-Reducer
> >>>>>>>>>>
> >>>>>>>>>> User
> >>>>>>>>>>
> >>>>>>>>>> oozie
> >>>>>>>>>>
> >>>>>>>>>> Process User
> >>>>>>>>>>
> >>>>>>>>>> oozie
> >>>>>>>>>>
> >>>>>>>>>> Group
> >>>>>>>>>>
> >>>>>>>>>> oozie
> >>>>>>>>>>
> >>>>>>>>>> Mapper Class
> >>>>>>>>>>
> >>>>>>>>>> PartialMultiplyMapper
> >>>>>>>>>>
> >>>>>>>>>> Reducer Class
> >>>>>>>>>>
> >>>>>>>>>> AggregateAndRecommendReducer
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Job Input Directory
> >>>>>>>>>>
> >>>>>>>>>> hdfs://nameservice1/itemrec/temp/partialMultiply
> >>>>>>>>>>
> >>>>>>>>>> Job Output Directory
> >>>>>>>>>>
> >>>>>>>>>> hdfs://nameservice1/itemrec/output/
> >>>>>>>>>>
> >>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:   
 Map input
> >>>>>> records=3312879
> >>>>>>>>>>
> >>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:   
 Map output
> >>>>>> records=3313251
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:   
 Reduce input
> >>>>>>> records=3313251
> >>>>>>>>>>
> >>>>>>>>>> 14/07/20 23:57:47 INFO mapred.JobClient:   
 Reduce output
> >>>>> records=0
> >>>>>>>>>>
> >>>>>>>>>> Why does mahout returns 0 rows? it works when
booleanData=true
> >>>>>>>>> (preferences
> >>>>>>>>>> are ignored...?)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> 2014-07-20 23:19 GMT+04:00 Serega Sheypak <
> >>>>> serega.sheypak@gmail.com
> >>>>>>> :
> >>>>>>>>>>
> >>>>>>>>>>> the version is: CDH-4.7.0-1.cdh4.7.0.p0.40
> >>>>>>>>>>> users_file:
> >>>>>>>>>>> --inverted_item_id
> >>>>>>>>>>> -1
> >>>>>>>>>>> -2
> >>>>>>>>>>> -3
> >>>>>>>>>>> -4
> >>>>>>>>>>>
> >>>>>>>>>>> users_items_prefs
> >>>>>>>>>>> --inverted item_id
> >>>>>>>>>>> -1 1 1.0
> >>>>>>>>>>> -2 2 1.0
> >>>>>>>>>>> -3 3 1.0
> >>>>>>>>>>> -4 4 1.0
> >>>>>>>>>>> --user_id item_id pref_value
> >>>>>>>>>>> 11   1 1.6
> >>>>>>>>>>> 11   2 1.6
> >>>>>>>>>>> 123 3 2.0
> >>>>>>>>>>> 123 4 2.0
> >>>>>>>>>>> 333 1 2.0
> >>>>>>>>>>> 333 2 1.6
> >>>>>>>>>>> --e.t.c.
> >>>>>>>>>>>
> >>>>>>>>>>> if I set --booleanData true
> >>>>>>>>>>> then mahout returns the result.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> 2014-07-20 23:12 GMT+04:00 Andrew Musselman
<
> >>>>>>> andrew.musselman@gmail.com
> >>>>>>>>>> :
> >>>>>>>>>>>
> >>>>>>>>>>> I'm confused about how you're constructing
the user file, and
> >>>> why
> >>>>>>> there
> >>>>>>>>>>>> are negated item ids here.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Can you post some more details please,
including Mahout
> >>> version
> >>>>> and
> >>>>>>>>> some
> >>>>>>>>>>>> sample data sets?
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Jul 20, 2014, at 11:57 AM, Serega
Sheypak <
> >>>>>>>>> serega.sheypak@gmail.com>
> >>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi, I'm trying to create item similarity.
> >>>>>>>>>>>>> I gather items which users visit
during shopping and then
> >>>>> create a
> >>>>>>>>> file:
> >>>>>>>>>>>>> user_id, item_id, weight (where
weight can be: [1.0, 1.6,
> >>>> 1.9],
> >>>>>>>>> depends
> >>>>>>>>>>>> on
> >>>>>>>>>>>>> user action type and data source)
> >>>>>>>>>>>>> UNION
> >>>>>>>>>>>>> -item_id, item_id, 1.0 (from items
dictionary)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> and I do provide a userFile, where
user_id = -item_id
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The idea is to get item similary.
If any user visits item
> >>>> named
> >>>>>>> "A", i
> >>>>>>>>>>>> want
> >>>>>>>>>>>>> to show him items "B", "c", "xxx"
using preferences of
> >>> other
> >>>>>> users.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The problem is that the last (???)
mapreduce job returns 0
> >>>> rows:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Here are my settings:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> sudo -u oozie mahout recommenditembased
\
> >>>>>>>>>>>>>                --input visited_items_with_inverted_items
> >>> \
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>                --output result \
> >>>>>>>>>>>>>                --similarityClassname
> >>>> SIMILARITY_LOGLIKELIHOOD
> >>>>> \
> >>>>>>>>>>>>>                --usersFile inverted_items
\
> >>>>>>>>>>>>>                --numRecommendations
500 \
> >>>>>>>>>>>>>                --booleanData false
\
> >>>>>>>>>>>>>                --maxPrefsPerUser
100 \
> >>>>>>>>>>>>>                --maxSimilaritiesPerItem
500 \
> >>>>>>>>>>>>>                --minPrefsPerUser
0\
> >>>>>>>>>>>>>                --maxPrefsPerUserInItemSimilarity
30 \
> >>>>>>>>>>>>>                --threshold 0.91
\
> >>>>>>>>>>>>>                --tempDir  temp \
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Some counters... I don't get what
do they mean....
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
> >>>>>>>>>>>>>
> >>>>>> org.apache.mahout.cf.taste.hadoop.item.ToUserVectorsReducer$Counters
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:43:08 INFO mapred.JobClient:
    USERS=7528530
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >
> org.apache.mahout.cf.taste.hadoop.preparation.ToItemVectorsMapper$Elements
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>>>>>>>>>> USER_RATINGS_NEGLECTED=1,798,738
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:43:43 INFO mapred.JobClient:
> >>>>>>>>>>>> USER_RATINGS_USED=12,429,693
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:44:24 INFO mapred.JobClient:
    ROWS=3312879
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >
> org.apache.mahout.math.hadoop.similarity.cooccurrence.RowSimilarityJob$Counters
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> >>>>>> COOCCURRENCES=35882374
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:45:18 INFO mapred.JobClient:
> >>>>>> PRUNED_COOCCURRENCES=0
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Map input
> >>>>>>> records=3312879
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Map output
> >>>>>>>>> records=17570268
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Reduce input
> >>>>>>>>>>>> records=5221907
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:46:00 INFO mapred.JobClient:
    Reduce output
> >>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce input
> >>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce output
> >>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce input
> >>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:46:34 INFO mapred.JobClient:
    Reduce output
> >>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Map input
> >>>>>>> records=7528530
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Map output
> >>>>>>>>> records=3313251
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Reduce input
> >>>>>>>>>>>> records=3313251
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:47:06 INFO mapred.JobClient:
    Reduce output
> >>>>>>>>>>>> records=3313251
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Map input
> >>>>>>> records=6626130
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Map output
> >>>>>>>>> records=6626130
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Reduce input
> >>>>>>>>>>>> records=6626130
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:47:40 INFO mapred.JobClient:
    Reduce output
> >>>>>>>>>>>> records=3312879
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Map input
> >>>>>>> records=3312879
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Map output
> >>>>>>>>> records=3313251
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Reduce input
> >>>>>>>>>>>> records=3313251
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> --------
> >>>>>>>>>>>>> 14/07/20 22:48:26 INFO mapred.JobClient:
    Reduce output
> >>>>>> records=0
> >>>>>>>>>>>>> --------
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> why 0???
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message