This question is regarding (StackOverflowError (VirtualMachineError) or NoClassDefFoundError (LinkageError) should not System.exit() in local mode) and hopes to draw attention-to and discussion-on that issue.

I have a product that is hosted as a microservice, running in a web-container e.g. Jetty, as a long-running service, publishing a REST API.  For small-computations, to reduce latency, I wish to run Spark in local mode.  For larger jobs the service might launch a remote job on a cluster e.g. Spark-on-YARN.  Either way, there may be custom modules deployed to the service from time-to-time, involving third-part libraries etc.

My concern is as outlined in SPARK-15685.  If I have a third-party library, and either direct or transient dependencies are not satisfied, when the code is deployed and run I might suffer a NoClassDefFoundError.  Or there may be some broken logic leading to a StackOverflowError (VirtualMachineError).  Normally if this occurred in a plan microservice/web-application, the thread handling the request would see the unchecked Throwable/Error and fail, but otherwise the service continues.

With Spark in local mode, due to the quite-specific categorization and handling of the aforementioned specific Throwable/Error types (ref Utils.isFatalError and other Scala definitions), the result when they are thrown is that Spark deems that the JVM should be forcibly shutdown via System.exit(), thereby killing the microservice.

Is it reasonable that in the face of the above Errors occuring, we should ask that Spark does not exit the JVM, instead allowing some exception or error to be thrown? The System.exit() approach seems aligned with the idea of a command-line job batch and a quick-exit of the entire JVM and any running threads, but it is poorly suited to running in local mode in a microservice.



