Thanks for reporting this. We added the warning when we spoiled some of our own experiments with faulty DNS configurations. I am not sure what could be done in this case.

Do you know the reason why the java dns reverse resolution works differently from nslookup in that case?

BTW:There should not be too many reverse name lookups. Each TaskManager does this once, upon startup. 


On Thu, Jul 9, 2015 at 11:36 AM, Robert Schmidtke <ro.schmidtke@gmail.com> wrote:
Hi everyone,

I'm currently testing data local computing of Flink on XtreemFS (I'm one of the developers). We have implemented our adapter using the hadoop FileSystem interface and all works well. However upon closer inspection, I found that only remote splits are assigned, which is strange, as XtreemFS stores files split across multiple nodes and reports the hostnames for each split. Specifically, I'm receiving the warning message issued in: https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/instance/InstanceConnectionInfo.java#L103

So each TaskManager cannot resolve their hostname from their IP, so the input split assigner cannot connect nodes to splits. This is because the nodes identify with their IPs (and not their hostnames), but the splits identify with hostnames, so no connection can be made, resulting in (mostly) non-local computing. I tracked the issue down and it turns out that the default name lookup mechanism in Java seems to be faulty on my cluster configuration. When passing in "env.java.opts: -Dsun.net.spi.nameservice.provider.1=dns,sun" (a non-default nameservice) in flink-conf.yaml, then the IP addresses are resolved to hostnames properly.

I know that this is probably not directly related to Flink, but given the fact that you specifically handle the case where hostname resolution is not possible, I was wondering whether you have experienced such cases, and if so, how you overcame the issue. I'm not particularly fond of performing way too many reverse lookups, when the normal strategy using files should work as well (note that nslookup <IP-OF-NODE> works as expected, and when strace'ing the command, it does not even connect to the nameserver).

Thanks in advance for your help

My GPG Key ID: 336E2680