spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Grove (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver
Date Wed, 30 Oct 2019 14:32:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andy Grove updated SPARK-29640:
-------------------------------
    Description: 
We are running into intermittent DNS issues where the Spark driver fails to resolve "kubernetes.default.svc"
when trying to create executors. We are running Spark 2.4.4 (with the patch for SPARK-28921)
in cluster mode.

This happens approximately 10% of the time.

Here is the stack trace:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
	at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
	at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get]  for
kind: [Pod]  with name: [wf-50000-69674f15d0fc45-1571354060179-driver]  in namespace: [tenant-8-workflows] 
failed.
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
	at scala.Option.map(Option.scala:146)
	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:55)
	at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
	... 20 more
Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
	at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
	at okhttp3.Dns$1.lookup(Dns.java:39)
	at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
	at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
	at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
	at okhttp3.RealCall.execute(RealCall.java:69)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:404)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:365)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:330)
	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:311)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:810)
	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:218)
	... 27 more  {code}
This issue seems to be caused by [https://github.com/kubernetes/kubernetes/issues/76790]

One suggested workaround is to specify TCP mode for DNS lookups in the pod spec ([https://github.com/kubernetes/kubernetes/issues/56903#issuecomment-424498508]).

I would like the ability to provide a flag to spark-submit to specify to use TCP mode for
DNS lookups.

I am working on a PR for this.

  was:
We are running into intermittent DNS issues where the Spark driver fails to resolve "kubernetes.default.svc"
and this seems to be caused by [https://github.com/kubernetes/kubernetes/issues/76790]

One suggested workaround is to specify TCP mode for DNS lookups in the pod spec ([https://github.com/kubernetes/kubernetes/issues/56903#issuecomment-424498508]).

I would like the ability to provide a flag to spark-submit to specify to use TCP mode for
DNS lookups.

I am working on a PR for this.

     Issue Type: Bug  (was: Improvement)
        Summary: [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc"
in Spark driver  (was: [K8S] Make it possible to set DNS option to use TCP instead of UDP)

> [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver
> ------------------------------------------------------------------------------------------
>
>                 Key: SPARK-29640
>                 URL: https://issues.apache.org/jira/browse/SPARK-29640
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 2.4.4
>            Reporter: Andy Grove
>            Priority: Major
>             Fix For: 2.4.5
>
>
> We are running into intermittent DNS issues where the Spark driver fails to resolve "kubernetes.default.svc"
when trying to create executors. We are running Spark 2.4.4 (with the patch for SPARK-28921)
in cluster mode.
> This happens approximately 10% of the time.
> Here is the stack trace:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot
be instantiated
> 	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794)
> 	at org.apache.spark.SparkContext.<init>(SparkContext.scala:493)
> 	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
> 	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
> 	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
> 	at scala.Option.getOrElse(Option.scala:121)
> 	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
> 	at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36)
> 	at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> 	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
> 	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
> 	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
> 	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
> 	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
> 	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
> 	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] 
for kind: [Pod]  with name: [wf-50000-69674f15d0fc45-1571354060179-driver]  in namespace:
[tenant-8-workflows]  failed.
> 	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64)
> 	at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72)
> 	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229)
> 	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162)
> 	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57)
> 	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55)
> 	at scala.Option.map(Option.scala:146)
> 	at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:55)
> 	at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89)
> 	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788)
> 	... 20 more
> Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again
> 	at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
> 	at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
> 	at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
> 	at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
> 	at java.net.InetAddress.getAllByName(InetAddress.java:1193)
> 	at java.net.InetAddress.getAllByName(InetAddress.java:1127)
> 	at okhttp3.Dns$1.lookup(Dns.java:39)
> 	at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:171)
> 	at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:137)
> 	at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:82)
> 	at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:171)
> 	at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:121)
> 	at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:100)
> 	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
> 	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
> 	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
> 	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
> 	at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:119)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
> 	at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
> 	at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:110)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92)
> 	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67)
> 	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185)
> 	at okhttp3.RealCall.execute(RealCall.java:69)
> 	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:404)
> 	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:365)
> 	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:330)
> 	at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:311)
> 	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:810)
> 	at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:218)
> 	... 27 more  {code}
> This issue seems to be caused by [https://github.com/kubernetes/kubernetes/issues/76790]
> One suggested workaround is to specify TCP mode for DNS lookups in the pod spec ([https://github.com/kubernetes/kubernetes/issues/56903#issuecomment-424498508]).
> I would like the ability to provide a flag to spark-submit to specify to use TCP mode
for DNS lookups.
> I am working on a PR for this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message