[ https://issues.apache.org/jira/browse/SPARK-28509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892137#comment-16892137
]
shane knapp commented on SPARK-28509:
-------------------------------------
as a precautionary step, on all ubuntu workers, i:
1) minikube stop && minikube delete
2) rm -rf .minikube .kube
3) rebooting the workers once all jobs are done.
things are now passing... i'm hoping this clears it up. the minikube/k8s versions haven't
changed, but i did find a couple of dead pods that needed cleaning up on amp-jenkins-staging-worker-02.
the dead pods were in a completely different namespace, so that shouldn't impact the tests.
i will keep a close eye on this and see if i can track the failures down to one specific worker...
that doesn't seem to be the case tho. :\
> K8S integration tests are failing
> ---------------------------------
>
> Key: SPARK-28509
> URL: https://issues.apache.org/jira/browse/SPARK-28509
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes, Tests
> Affects Versions: 3.0.0
> Reporter: Marcelo Vanzin
> Priority: Major
>
> I've been seeing lots of failures in master. e.g. https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/13180/console
> {noformat}
> - Start pod creation from template *** FAILED ***
> io.fabric8.kubernetes.client.KubernetesClientException: 404 page not found
> at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
> at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
> at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
> at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
> at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> ...
> - PVs with local storage *** FAILED ***
> io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at:
https://192.168.39.112:8443/api/v1/persistentvolumes. Message: PersistentVolume "test-local-pv"
is invalid: [spec.local: Forbidden: Local volumes are disabled by feature-gate, metadata.annotations:
Required value: Local volume requires node affinity]. Received status: Status(apiVersion=v1,
code=422, details=StatusDetails(causes=[StatusCause(field=spec.local, message=Forbidden: Local
volumes are disabled by feature-gate, reason=FieldValueForbidden, additionalProperties={}),
StatusCause(field=metadata.annotations, message=Required value: Local volume requires node
affinity, reason=FieldValueRequired, additionalProperties={})], group=null, kind=PersistentVolume,
name=test-local-pv, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status,
message=PersistentVolume "test-local-pv" is invalid: [spec.local: Forbidden: Local volumes
are disabled by feature-gate, metadata.annotations: Required value: Local volume requires
node affinity], metadata=ListMeta(_continue=null, resourceVersion=null, selfLink=null, additionalProperties={}),
reason=Invalid, status=Failure, additionalProperties={}).
> at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:478)
> at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:417)
> at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:381)
> at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
> at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:227)
> at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:787)
> at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:357)
> at org.apache.spark.deploy.k8s.integrationtest.PVTestsSuite.setupLocalStorage(PVTestsSuite.scala:87)
> at org.apache.spark.deploy.k8s.integrationtest.PVTestsSuite.$anonfun$$init$$1(PVTestsSuite.scala:137)
> at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> ...
> - Launcher client dependencies *** FAILED ***
> The code passed to eventually never returned normally. Attempted 1 times over 6.673903200033333
minutes. Last failure message: assertion failed:
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org
|