spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "igor.berman" <>
Subject Regression of external shuffle service spark 2.3 vs spark 2.2
Date Mon, 19 Nov 2018 12:15:13 GMT
any inputs will be welcome regarding below
We are running with external shuffle service. Mesos cluster(1.5.1)

After upgrading our production workload to spark 2.3 we started to see OOM
failures of external shuffle services(running on each node).

Does anybody experienced same problems?
Any direction to any code would be helpful(I know that there was work done
in external shuffle service domain under 2.3, but from reading PRs can't
pinpoint what change causing those OOM)

Unfortunately there is no test case for reproduction and even with 2.3, OOM
failures start after 2+ days of production load


Sent from:

To unsubscribe e-mail:

View raw message