hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yoonmin Nam" <rony...@dgist.ac.kr>
Subject Printing custom counter and stdout log is failed in MapReduce engine
Date Fri, 13 Dec 2013 03:52:29 GMT
I tried to make a my counter to check the operation time of Maptask.run and
ReduceTask.run using global counter or even LOG.info().

 

In a pseudo-distributed mode, using builded mapreduce-core-snapshot.jar
which I modified in the MapTask.java and ReduceTask.java to make my goal, it
works perfectly as I expected.

 

However, in a real cluster, the source code I injected isn't work and just
system-default counter variables are showing.

Also, stdout log by System.out.println() is not appeared at the stdout log
in task attempt log.

 

Specifically, there are 29 counters in job level and 15 counters in task
level which are:

 

Job-level counters (29):



Job Counters

SLOTS_MILLIS_MAPS

0

0

2,596


Launched reduce tasks

0

0

1


Total time spent by all reduces waiting after reserving slots (ms)

0

0

0


Rack-local map tasks

0

0

1


Total time spent by all maps waiting after reserving slots (ms)

0

0

0


Launched map tasks

0

0

1


SLOTS_MILLIS_REDUCES

0

0

8,059


File Output Format Counters

Bytes Written

0

0

178,730


File Input Format Counters

Bytes Read

0

0

493,744


FileSystemCounters

FILE_BYTES_READ

0

232,194

232,194


HDFS_BYTES_READ

493,847

0

493,847


FILE_BYTES_WRITTEN

289,043

288,923

577,966


HDFS_BYTES_WRITTEN

0

178,730

178,730


Map-Reduce Framework

Reduce input groups

0

0

13,606


Map output materialized bytes

0

0

232,194


Combine output records

0

0

13,606


Map input records

0

0

12,484


Reduce shuffle bytes

0

0

232,194


Physical memory (bytes) snapshot

0

0

390,144,000


Reduce output records

0

0

13,606


Spilled Records

0

0

27,212


Map output bytes

0

0

709,697


Total committed heap usage (bytes)

0

0

490,078,208


CPU time spent (ms)

0

0

1,970


Virtual memory (bytes) snapshot

0

0

3,586,727,936


SPLIT_RAW_BYTES

103

0

103


Map output records

0

0

63,245


Combine input records

0

0

63,245


Reduce input records

0

0

13,606

 

And 

 

Task-level counters (15):

 



FileSystemCounters

	
HDFS_BYTES_READ

493,847

	
FILE_BYTES_WRITTEN

289,043



File Input Format Counters

	
Bytes Read

493,744



Map-Reduce Framework

	
Map output materialized bytes

232,194

	
Combine output records

13,606

	
Map input records

12,484

	
Physical memory (bytes) snapshot

247,943,168

	
Spilled Records

13,606

	
Map output bytes

709,697

	
CPU time spent (ms)

950

	
Total committed heap usage (bytes)

245,039,104

	
Virtual memory (bytes) snapshot

1,786,130,432

	
Combine input records

63,245

	
Map output records

63,245

	
SPLIT_RAW_BYTES

103


 

 

 

 

And here is the Task logs:

 

Task Logs: 'attempt_201312131236_0004_m_000000_0'



stdout logs



  _____  



stderr logs



  _____  



syslog logs



2013-12-13 12:44:15,629 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded
the native-hadoop library

2013-12-13 12:44:15,849 INFO org.apache.hadoop.util.ProcessTree: setsid
exited with exit code 0

2013-12-13 12:44:15,850 INFO org.apache.hadoop.mapred.Task:  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@46388f05

2013-12-13 12:44:15,894 INFO org.apache.hadoop.mapred.MapTask: Processing
split: hdfs://10.40.3.56:9000/user/root/input:0+493744

2013-12-13 12:44:15,901 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb =
100

2013-12-13 12:44:15,910 INFO org.apache.hadoop.mapred.MapTask: data buffer =
79691776/99614720

2013-12-13 12:44:15,910 INFO org.apache.hadoop.mapred.MapTask: record buffer
= 262144/327680

2013-12-13 12:44:15,917 WARN
org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not
loaded

2013-12-13 12:44:16,123 INFO org.apache.hadoop.mapred.MapTask: Starting
flush of map output

2013-12-13 12:44:16,304 INFO org.apache.hadoop.mapred.MapTask: Finished
spill 0

2013-12-13 12:44:16,305 INFO org.apache.hadoop.mapred.Task:
Task:attempt_201312131236_0004_m_000000_0 is done. And is in the process of
commiting

2013-12-13 12:44:16,338 INFO org.apache.hadoop.mapred.Task: Task
'attempt_201312131236_0004_m_000000_0' done.

2013-12-13 12:44:16,352 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1

2013-12-13 12:44:16,367 INFO org.apache.hadoop.io.nativeio.NativeIO:
Initialized cache for UID to User mapping with a cache timeout of 14400
seconds.

2013-12-13 12:44:16,367 INFO org.apache.hadoop.io.nativeio.NativeIO: Got
UserName root for UID 0 from the native implementation

 

As you can see, nothing is happened.

 

I added three Map-Reduce Framework counters and 3 Log.info message in

MapTask.java and ReduceTask.java but it isn't working.

 

But in pseudo-distributed mode I setup in VMWare virtual machine.. it works
perfectly

 

 

       Counter       Map    Reduce        Total

Job Counters Launched reduce tasks      0      0      1

SLOTS_MILLIS_MAPS   0      0      32,688

Total time spent by all reduces waiting after reserving slots (ms)       0
0      0

Total time spent by all maps waiting after reserving slots (ms)    0      0
0

Launched map tasks 0      0      1

Data-local map tasks       0      0      1

SLOTS_MILLIS_REDUCES       0      0      10,927

File Output Format Counters      Bytes Written       0      3,191,633
3,191,633

FileSystemCounters FILE_BYTES_READ     98,736,003    0      98,736,003

HDFS_BYTES_READ     73,523,307    0      73,523,307

FILE_BYTES_WRITTEN 120,510,936   59,683        120,570,619

File Input Format Counters       Bytes Read    73,523,200    0
73,523,200

Map-Reduce Framework       Map output materialized bytes    3,896,453     0
3,896,453

Map input records   476,298       0      476,298

Reduce shuffle bytes       0      3,896,453     3,896,453

Spilled Records     7,326,590     234,937       7,561,527

Map output bytes    100,648,704   0      100,648,704

Total committed heap usage (bytes)      158,728,192   60,686,336
219,414,528

CPU time spent (ms) 20,950        290    21,240

SPLIT_RAW_BYTES     107    0      107

Combine input records      11,078,530    0      11,078,530

Reduce input records       0      234,937       234,937

Reduce input groups 0      234,937       234,937

Combine output records     4,413,017     0      4,413,017

Physical memory (bytes) snapshot 196,603,904   88,289,280    284,893,184

Reduce output records      0      234,937       234,937

Spill time spent 16,005 0         16,005

Virtual memory (bytes) snapshot 816,660,480   822,988,800   1,639,649,280

Map output records 6,900,450     0      6,900,450

Map Task time spent     24,775 0         24,775

Reduce Task time spent 0         7,829   7,829

 

 

I think there are different logging logic or security issue between
pseudo-distributed and cluster mode, but I am not sure about that.

 

Is there anyone who can let me know about it?

 

Thanks!

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message