spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shane knapp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-28509) K8S integration tests are failing
Date Wed, 24 Jul 2019 20:04:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-28509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892137#comment-16892137
] 

shane knapp commented on SPARK-28509:
-------------------------------------

as a precautionary step, on all ubuntu workers, i:

1) minikube stop && minikube delete
2) rm -rf .minikube .kube
3) rebooting the workers once all jobs are done.

things are now passing...  i'm hoping this clears it up.  the minikube/k8s versions haven't
changed, but i did find a couple of dead pods that needed cleaning up on amp-jenkins-staging-worker-02.
 the dead pods were in a completely different namespace, so that shouldn't impact the tests.

i will keep a close eye on this and see if i can track the failures down to one specific worker...
 that doesn't seem to be the case tho.  :\

> K8S integration tests are failing
> ---------------------------------
>
>                 Key: SPARK-28509
>                 URL: https://issues.apache.org/jira/browse/SPARK-28509
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes, Tests
>    Affects Versions: 3.0.0
>            Reporter: Marcelo Vanzin
>            Priority: Major
>
> I've been seeing lots of failures in master. e.g. https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/13180/console
> {noformat}
> - Start pod creation from template *** FAILED ***
>   io.fabric8.kubernetes.client.KubernetesClientException: 404 page not found
>   at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
>   at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571)
>   at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198)
>   at okhttp3.RealCall$AsyncCall.execute(RealCall.java:206)
>   at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
>   ...
> - PVs with local storage *** FAILED ***
>   io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at:
https://192.168.39.112:8443/api/v1/persistentvolumes. Message: PersistentVolume "test-local-pv"
is invalid: [spec.local: Forbidden: Local volumes are disabled by feature-gate, metadata.annotations:
Required value: Local volume requires node affinity]. Received status: Status(apiVersion=v1,
code=422, details=StatusDetails(causes=[StatusCause(field=spec.local, message=Forbidden: Local
volumes are disabled by feature-gate, reason=FieldValueForbidden, additionalProperties={}),
StatusCause(field=metadata.annotations, message=Required value: Local volume requires node
affinity, reason=FieldValueRequired, additionalProperties={})], group=null, kind=PersistentVolume,
name=test-local-pv, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status,
message=PersistentVolume "test-local-pv" is invalid: [spec.local: Forbidden: Local volumes
are disabled by feature-gate, metadata.annotations: Required value: Local volume requires
node affinity], metadata=ListMeta(_continue=null, resourceVersion=null, selfLink=null, additionalProperties={}),
reason=Invalid, status=Failure, additionalProperties={}).
>   at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:478)
>   at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:417)
>   at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:381)
>   at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:344)
>   at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:227)
>   at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:787)
>   at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:357)
>   at org.apache.spark.deploy.k8s.integrationtest.PVTestsSuite.setupLocalStorage(PVTestsSuite.scala:87)
>   at org.apache.spark.deploy.k8s.integrationtest.PVTestsSuite.$anonfun$$init$$1(PVTestsSuite.scala:137)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   ...
> - Launcher client dependencies *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 1 times over 6.673903200033333
minutes. Last failure message: assertion failed: 
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message