kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garrett Barton <garrett.bar...@gmail.com>
Subject Flink on Yarn -Connection unexpectedly close dby remote task manager..
Date Fri, 15 Jun 2018 20:47:40 GMT
Hey all,

 My jobs that I am trying to write in Flink 1.5 are failing after a few
minutes.  I recon its because the idle task managers are shutting down, but
it seems to kill the client and the running job, which was still going on
one of the other task managers.  either way I get:

org.apache.flink.client.program.ProgramInvocationException:
org.apache.flink.runtime.io.network.netty.exception.RemoteTransportException:
Connection unexpectedly closed by remote task manager 'xxxx'. This might
indicate that the remote task manager was lost...

Now I happen to have the last part of the flow paralleled to 1 right now
for debugging, so the 4 task managers that are spun up, 3 of them hit the
timeout period (currently set to 240000).  I think as soon as the first one
goes the client throws up and the whole job dies as a result.

 Is this expected behavior and if so, short of making obscene task manager
timeout times, is there another way around it?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message