I tried to make a my counter to check the operation time of Maptask.run and
ReduceTask.run using global counter or even LOG.info().
In a pseudo-distributed mode, using builded mapreduce-core-snapshot.jar
which I modified in the MapTask.java and ReduceTask.java to make my goal, it
works perfectly as I expected.
However, in a real cluster, the source code I injected isn't work and just
system-default counter variables are showing.
Also, stdout log by System.out.println() is not appeared at the stdout log
in task attempt log.
Specifically, there are 29 counters in job level and 15 counters in task
level which are:
Job-level counters (29):
Job Counters
SLOTS_MILLIS_MAPS
0
0
2,596
Launched reduce tasks
0
0
1
Total time spent by all reduces waiting after reserving slots (ms)
0
0
0
Rack-local map tasks
0
0
1
Total time spent by all maps waiting after reserving slots (ms)
0
0
0
Launched map tasks
0
0
1
SLOTS_MILLIS_REDUCES
0
0
8,059
File Output Format Counters
Bytes Written
0
0
178,730
File Input Format Counters
Bytes Read
0
0
493,744
FileSystemCounters
FILE_BYTES_READ
0
232,194
232,194
HDFS_BYTES_READ
493,847
0
493,847
FILE_BYTES_WRITTEN
289,043
288,923
577,966
HDFS_BYTES_WRITTEN
0
178,730
178,730
Map-Reduce Framework
Reduce input groups
0
0
13,606
Map output materialized bytes
0
0
232,194
Combine output records
0
0
13,606
Map input records
0
0
12,484
Reduce shuffle bytes
0
0
232,194
Physical memory (bytes) snapshot
0
0
390,144,000
Reduce output records
0
0
13,606
Spilled Records
0
0
27,212
Map output bytes
0
0
709,697
Total committed heap usage (bytes)
0
0
490,078,208
CPU time spent (ms)
0
0
1,970
Virtual memory (bytes) snapshot
0
0
3,586,727,936
SPLIT_RAW_BYTES
103
0
103
Map output records
0
0
63,245
Combine input records
0
0
63,245
Reduce input records
0
0
13,606
And
Task-level counters (15):
FileSystemCounters
HDFS_BYTES_READ
493,847
FILE_BYTES_WRITTEN
289,043
File Input Format Counters
Bytes Read
493,744
Map-Reduce Framework
Map output materialized bytes
232,194
Combine output records
13,606
Map input records
12,484
Physical memory (bytes) snapshot
247,943,168
Spilled Records
13,606
Map output bytes
709,697
CPU time spent (ms)
950
Total committed heap usage (bytes)
245,039,104
Virtual memory (bytes) snapshot
1,786,130,432
Combine input records
63,245
Map output records
63,245
SPLIT_RAW_BYTES
103
And here is the Task logs:
Task Logs: 'attempt_201312131236_0004_m_000000_0'
stdout logs
_____
stderr logs
_____
syslog logs
2013-12-13 12:44:15,629 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded
the native-hadoop library
2013-12-13 12:44:15,849 INFO org.apache.hadoop.util.ProcessTree: setsid
exited with exit code 0
2013-12-13 12:44:15,850 INFO org.apache.hadoop.mapred.Task: Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@46388f05
2013-12-13 12:44:15,894 INFO org.apache.hadoop.mapred.MapTask: Processing
split: hdfs://10.40.3.56:9000/user/root/input:0+493744
2013-12-13 12:44:15,901 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb =
100
2013-12-13 12:44:15,910 INFO org.apache.hadoop.mapred.MapTask: data buffer =
79691776/99614720
2013-12-13 12:44:15,910 INFO org.apache.hadoop.mapred.MapTask: record buffer
= 262144/327680
2013-12-13 12:44:15,917 WARN
org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not
loaded
2013-12-13 12:44:16,123 INFO org.apache.hadoop.mapred.MapTask: Starting
flush of map output
2013-12-13 12:44:16,304 INFO org.apache.hadoop.mapred.MapTask: Finished
spill 0
2013-12-13 12:44:16,305 INFO org.apache.hadoop.mapred.Task:
Task:attempt_201312131236_0004_m_000000_0 is done. And is in the process of
commiting
2013-12-13 12:44:16,338 INFO org.apache.hadoop.mapred.Task: Task
'attempt_201312131236_0004_m_000000_0' done.
2013-12-13 12:44:16,352 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-12-13 12:44:16,367 INFO org.apache.hadoop.io.nativeio.NativeIO:
Initialized cache for UID to User mapping with a cache timeout of 14400
seconds.
2013-12-13 12:44:16,367 INFO org.apache.hadoop.io.nativeio.NativeIO: Got
UserName root for UID 0 from the native implementation
As you can see, nothing is happened.
I added three Map-Reduce Framework counters and 3 Log.info message in
MapTask.java and ReduceTask.java but it isn't working.
But in pseudo-distributed mode I setup in VMWare virtual machine.. it works
perfectly
Counter Map Reduce Total
Job Counters Launched reduce tasks 0 0 1
SLOTS_MILLIS_MAPS 0 0 32,688
Total time spent by all reduces waiting after reserving slots (ms) 0
0 0
Total time spent by all maps waiting after reserving slots (ms) 0 0
0
Launched map tasks 0 0 1
Data-local map tasks 0 0 1
SLOTS_MILLIS_REDUCES 0 0 10,927
File Output Format Counters Bytes Written 0 3,191,633
3,191,633
FileSystemCounters FILE_BYTES_READ 98,736,003 0 98,736,003
HDFS_BYTES_READ 73,523,307 0 73,523,307
FILE_BYTES_WRITTEN 120,510,936 59,683 120,570,619
File Input Format Counters Bytes Read 73,523,200 0
73,523,200
Map-Reduce Framework Map output materialized bytes 3,896,453 0
3,896,453
Map input records 476,298 0 476,298
Reduce shuffle bytes 0 3,896,453 3,896,453
Spilled Records 7,326,590 234,937 7,561,527
Map output bytes 100,648,704 0 100,648,704
Total committed heap usage (bytes) 158,728,192 60,686,336
219,414,528
CPU time spent (ms) 20,950 290 21,240
SPLIT_RAW_BYTES 107 0 107
Combine input records 11,078,530 0 11,078,530
Reduce input records 0 234,937 234,937
Reduce input groups 0 234,937 234,937
Combine output records 4,413,017 0 4,413,017
Physical memory (bytes) snapshot 196,603,904 88,289,280 284,893,184
Reduce output records 0 234,937 234,937
Spill time spent 16,005 0 16,005
Virtual memory (bytes) snapshot 816,660,480 822,988,800 1,639,649,280
Map output records 6,900,450 0 6,900,450
Map Task time spent 24,775 0 24,775
Reduce Task time spent 0 7,829 7,829
I think there are different logging logic or security issue between
pseudo-distributed and cluster mode, but I am not sure about that.
Is there anyone who can let me know about it?
Thanks!
|