spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pat Ferrel <...@occamsmachete.com>
Subject Spark with Kubernetes connecting to pod id, not address
Date Wed, 13 Feb 2019 01:47:34 GMT


From: Pat Ferrel <pat@actionml.com>
Reply: Pat Ferrel <pat@actionml.com>
Date: February 12, 2019 at 5:40:41 PM
To: user@spark.apache.org <user@spark.apache.org>
Subject:  Spark with Kubernetes connecting to pod id, not address  

We have a k8s deployment of several services including Apache Spark. All services seem to
be operational. Our application connects to the Spark master to submit a job using the k8s
DNS service for the cluster where the master is called `spark-api` so we use `master=spark://spark-api:7077`
and we use `spark.submit.deployMode=cluster`. We submit the job through the API not by the
spark-submit script. 

This will run the "driver" and all "executors" on the cluster and this part seems to work
but there is a callback to the launching code in our app from some Spark process. For some
reason it is trying to connect to `harness-64d97d6d6-4r4d8`, which is the **pod ID**, not
the k8s cluster IP or DNS.

How could this **pod ID** be getting into the system? Spark somehow seems to think it is the
address of the service that called it. Needless to say any connection to the k8s pod ID fails
and so does the job.

Any idea how Spark could think the **pod ID** is an IP address or DNS name? 

BTW if we run a small sample job with `master=local` all is well, but the same job executed
with the above config tries to connect to the spurious pod ID.

BTW2 the pod launching the Spark job has the k8s DNS name "harness-api” not sure if this
matters

Thanks in advance

Mime
View raw message