hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Atul Aggarwal" <aagga...@ncsu.edu>
Subject Want to compare two consecutive jobs
Date Sat, 23 Apr 2011 00:41:06 GMT


I am graduate student in Computer Science.  Out of curiosity & research work
I wanted to understand the Hadoop framework in detail. As per my interest, I
want to know if I can compare two consecutive jobs in Hadoop. If not I would
appreciate if anyone can tell me how to proceed with that. To be precise, I
want to compare the jobs in terms of what exactly two jobs did? The reason
behind doing this is to create a statistics about how many jobs executed on
Hadoop were similar in terms of the behavior.  For example how many times
same sorting function was executed on the same input.


For example if first job did something like SortList(A) and some other job
did SortList(A)+Group(result(SortList(A)). Now, I am wondering if in Hadoop
there is some mapping being stored somewhere like JobID X-> SortList(A).


So far, I thought of this problem as finding the entry point in Hadoop and
try to understand how job is created and what information is being kept with
a jobID and in what form (in a code form or some description) , but I was
not able to figure it out successfully. 


Any guidance would be really appreciated.




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message