Hi Gordon,

we use Flink Flink 1.2.0 bundled with Hadoop 2.6 and Scala 2.11 build on 2017-02-02.

Cheers

Dominique


Am 30.05.2017 um 16:31 schrieb Tzu-Li (Gordon) Tai:
Hi Dominique,

Could you tell us the version / build commit of Flink that you’re using?

Cheers,
Gordon


On 30 May 2017 at 4:29:08 PM, Dominique Rondé (dominique.ronde@allsecur.de) wrote:

Hi folks,

I just become into the need to bring Flink into a yarn system, that is configured with kerberos. According to the documentation, I changed the flink.conf.yaml like that:

security.kerberos.login.use-ticket-cache: true
security.kerberos.login.contexts: Client

I know that providing a keytab is the prefered, but I have to do a special request to receive one. ;-)

After startup, the provisionent is stopped by this error:

2017-05-30 16:16:48,684 INFO  org.apache.flink.yarn.YarnClusterClient                       - Waiting until all TaskManagers have connected
Waiting until all TaskManagers have connected
2017-05-30 16:16:48,685 INFO  org.apache.flink.yarn.YarnClusterClient                       - Starting client actor system.
2017-05-30 16:16:52,099 WARN  org.apache.flink.runtime.net.ConnectionUtils                  - Could not connect to lfrar255.srv.allianz/10.17.24.162:56659. Selecting a local address using heuristics.
2017-05-30 16:16:52,473 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2017-05-30 16:16:52,512 INFO  Remoting                                                      - Starting remoting
2017-05-30 16:16:52,670 INFO  Remoting                                                      - Remoting started; listening on addresses :[akka.tcp://flink@sla09037.srv.allianz:34579]
Exception in thread "main" java.lang.RuntimeException: Unable to get ClusterClient status from Application Client
        at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:248)
        at org.apache.flink.yarn.YarnClusterClient.waitForClusterToBeReady(YarnClusterClient.java:520)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:660)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:476)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:473)
        at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
        at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
        at org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:473)
Caused by: org.apache.flink.runtime.leaderretrieval.LeaderRetrievalException: Could not retrieve the leader gateway
        at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:141)
        at org.apache.flink.client.program.ClusterClient.getJobManagerGateway(ClusterClient.java:691)
        at org.apache.flink.yarn.YarnClusterClient.getClusterStatus(YarnClusterClient.java:242)
        ... 10 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]
        at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
        at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:190)
        at scala.concurrent.Await.result(package.scala)
        at org.apache.flink.runtime.util.LeaderRetrievalUtils.retrieveLeaderGateway(LeaderRetrievalUtils.java:139)
        ... 12 more
2017-05-30 16:17:02,690 INFO  org.apache.flink.yarn.YarnClusterClient                       - Shutting down YarnClusterClient from the client shutdown hook
2017-05-30 16:17:02,691 INFO  org.apache.flink.yarn.YarnClusterClient                       - Disconnecting YarnClusterClient from ApplicationMaster
2017-05-30 16:17:03,693 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting down remote daemon.
2017-05-30 16:17:03,696 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote daemon shut down; proceeding with flushing remote transports.
2017-05-30 16:17:03,744 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting shut down.
 
Has anyone an idea what is going wrong?

Best wished

Dominique