tez-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tsuyoshi OZAWA <ozawa.tsuyo...@gmail.com>
Subject Re: OrderedWordCount slow down or hung-up for a time
Date Tue, 19 Aug 2014 11:30:37 GMT
Thanks for your reply, Gopal and Bikas.

The first reason of the problem was a wrong JVM option about AM container.
However, there still is hang-up at startup time after configuring it
correctly. This problem is caused even if tez.am.max.app.attempts ==
1. I attached the log on TEZ-1457, so I appreciate you can check it.

Thanks,
- Tsuyoshi

---------- Forwarded message ----------
From: Gopal V <gopalv@apache.org>
Date: Tue, Aug 19, 2014 at 4:45 AM
Subject: Re: OrderedWordCount slow down or hung-up for a time
To: Tsuyoshi OZAWA <ozawa.tsuyoshi@gmail.com>, dev@tez.apache.org


On 8/18/14, 1:26 AM, Tsuyoshi OZAWA wrote:

Hi,

That is not the correct behaviour for the normal runs.

The 10 minute gap sounds a lot like AM recovery is kicking in.

This will happen if the application master is killed half-way through
a job and it recovers work from the HDFS recovery logs.

The most common case for me is when the AM container size and java
opts are not in sync.

Can you check if that's happening in cases with only just a single AM attempt?

Cheers,
Gopal


> I'm trying some jobs in examples of tez. I found that sometimes jobs
> get hung-up on my distributed environment.
> Because of this behavior, the jobs on tez get slow down and sometimes
> get slower than original MapReduce jobs(e.g. if I run
> OrderedWordCount, WordCount + Sort in MapReduce is faster than
> OrderedWordCount on Tez sometimes). Is this correct behavior? Do you
> know how can we solve it or tune Tez program? I attached logs at the
> time as follows:
>
> <--- log1 - job start up time -->
> 14/08/18 16:02:36 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING
> Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> 14/08/18 16:02:36 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0
> Killed: 0
> 14/08/18 16:02:36 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> Tokenizer Progress: 0% TotalTasks: -1 Succeeded: 0 Running: 0 Failed:
> 0 Killed: 0
> 14/08/18 16:02:36 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> Summation Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 0 Failed: 0
> Killed: 0
>
> -- next print is 10 minutes later --
>
> 14/08/18 16:15:57 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING
> Progress: 1.04% TotalTasks: 96 Succeeded: 1 Running: 52 Failed: 0
> Killed: 0
> 14/08/18 16:15:57 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0
> Killed: 0
> 14/08/18 16:15:57 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> Tokenizer Progress: 3.37% TotalTasks: 89 Succeeded: 3 Running: 52
> Failed: 0 Killed: 0
> 14/08/18 16:15:57 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> Summation Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 0 Failed: 0 Killed: 0
> <--- log1 -->
>
>
> <--- log2 -->
> 14/08/18 16:35:15 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING
> Progress: 92.71% TotalTasks: 96 Succeeded: 89 Running: 6 Failed: 0
> Killed: 0
> 14/08/18 16:35:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> Tokenizer Progress: 100% TotalTasks: 89 Succeeded: 89 Running: 0
> Failed: 0 Killed: 0
> 14/08/18 12:19:25 INFO rpc.DAGClientRPCImpl: VertexStatus:
> VertexStatus: VertexName: Sorter Progress: 0% TotalTasks: 1 Succeeded:
> 0 Running: 0 Failed: 0 Killed: 0
> 14/08/18 16:35:15 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> Summation Progress: 0% TotalTasks: 6 Succeeded: 0 Running: 6 Failed: 0
> Killed: 0
>
> --- next print is 10 minutes later --
>
> 14/08/18 16:53:28 INFO rpc.DAGClientRPCImpl: DAG: State: RUNNING
> Progress: 93.75% TotalTasks: 96 Succeeded: 90 Running: 5 Failed: 0
> Killed: 0
> 14/08/18 16:53:28 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> Sorter Progress: 0% TotalTasks: 1 Succeeded: 0 Running: 0 Failed: 0
> Killed: 0
> 14/08/18 16:53:28 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
>   Tokenizer Progress: 100% TotalTasks: 89 Succeeded: 89 Running: 0
> Failed: 0 Killed: 0
> 14/08/18 16:53:28 INFO rpc.DAGClientRPCImpl: VertexStatus: VertexName:
> Summation Progress: 16.67% TotalTasks: 6 Succeeded: 1 Running: 5
> Failed: 0 Killed: 0
> <--- log2 -->
>
> Thanks,
> - Tsuyoshi
>
>



-- 
- Tsuyoshi

Mime
View raw message