From didata <>
Subject Re: Spark processes not doing on killing corresponding YARN application
Date Tue, 09 Sep 2014 20:06:48 GMT
I figured out this issue (in our case) ...And I'll vent a little in my reply
here... =:)Fedora's well-intentioned firewall (firewall-cmd) requires you to
open (enable) any port/services on a host that you need to connect to
(including SSH/22 - which is enabled by default, of course). So when
launching client applications that use ephemeral ports to connect back to
(as a Spark App does for remote YARN ResourceManager/NodeManagers to connect
back to), you can't know what that port will be to enable it, unless the
application allows you to specify that as a launch property (which you can
for Spark Apps via -- -Dspark.driver.port="NNNNN").Again, well intentioned,
but always a pain.So... you have to either disable the firewall capability
in Fedora; or you open/enable a range of ports and tell your applications to
use one of those.Also note that as of this writing, firewall-cmd's ability
to port-forwarding from the HOST to GUESTS in Libvirt/KVM-based
Hadoop/YARN/HDFS test/dev clusters, doesn't work (it never has -- it's on
the TODO list). It's another capability that you'll need in order to reach
daemon ports running *inside* the KVM cluster (for example, UI ports). The
work-around here (besides, again, disabling the Fedora Firewall altogether)
is to use same-subnet BRIDGING (not NAT-ting). Doing that will eliminate the
need for port-forawrding (which again doesn't work). I've filed bugs in the
past for this.So that is why YARN applications weren't terminating correctly
for Spark Aps, or for that matter working at all since it uses ephemeral
ports (by necessity).So whatever the port your Spark application uses,
remember to issue the command:use@driverHost$ sudo firewall-cmd
--zone=public --add-port=/SparkAppPort//tcpor, better yet, use a
port-deterministic strategy mentioned earlier.(Hopefully the verbosity here
will help someone in their furute search. Fedora aside, the original problem
here can be network related, as I discovered).sincerely,didata

