spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tin Hang To (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-30310) SparkUncaughtExceptionHandler halts running process unexpectedly
Date Thu, 19 Dec 2019 20:58:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-30310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tin Hang To updated SPARK-30310:
--------------------------------
    Description: 
During 2.4.x testing, we have many occasions where the Worker process would just DEAD unexpectedly,
with the Worker log ends with:

 

{{ERROR SparkUncaughtExceptionHandler: scala.MatchError:  <...callstack...>}}

 

We get the same callstack during our 2.3.x testing but the Worker process stays up.

Upon looking at the 2.4.x SparkUncaughtExceptionHandler.scala compared to the 2.3.x version,
 we found out SPARK-24294 introduced the following change:


{{exception catch {}}
{{  case _: OutOfMemoryError =>}}
{{    System.exit(SparkExitCode.OOM)}}
{{  case e: SparkFatalException if e.throwable.isInstanceOf[OutOfMemoryError] =>}}
{{    // SPARK-24294: This is defensive code, in case that SparkFatalException is}}
{{    // misused and uncaught.}}
{{    System.exit(SparkExitCode.OOM)}}
{{  case _ if exitOnUncaughtException =>}}
{{    System.exit(SparkExitCode.UNCAUGHT_EXCEPTION)}}
{{}}}

 

This code has the _ if exitOnUncaughtException case, but not the other _ cases.  As a result,
when exitOnUncaughtException is false (Master and Worker) and exception doesn't match any
of the match cases (e.g., IllegalStateException), Scala throws MatchError(exception) ("MatchError"
wrapper of the original exception).  Then the other catch block down below thinks we have
another uncaught exception, and halts the entire process with SparkExitCode.UNCAUGHT_EXCEPTION_TWICE.

 

{{catch {}}
{{  case oom: OutOfMemoryError => Runtime.getRuntime.halt(SparkExitCode.OOM)}}
{{  case t: Throwable => Runtime.getRuntime.halt(SparkExitCode.UNCAUGHT_EXCEPTION_TWICE)}}
{{}}}

 

Therefore, even when exitOnUncaughtException is false, the process will halt.

  was:
During 2.4.x testing, we have many occasions where the Worker process would just DEAD unexpectedly,
with the Worker log ends with:

 

{{ERROR SparkUncaughtExceptionHandler: scala.MatchError:  <...callstack...>}}

 

We get the same callstack during our 2.3.x testing but the Worker process stays up.

Upon looking at the 2.4.x SparkUncaughtExceptionHandler.scala compared to the 2.3.x version,
 we found out SPARK-24294 introduced the following change:


{{  case _: OutOfMemoryError =>}}
{{    System.exit(SparkExitCode.OOM)}}
{{  case e: SparkFatalException if e.throwable.isInstanceOf[OutOfMemoryError] =>}}
{{    // SPARK-24294: This is defensive code, in case that SparkFatalException is}}
{{    // misused and uncaught.}}
{{    System.exit(SparkExitCode.OOM)}}
{{  case _ if exitOnUncaughtException =>}}
{{    System.exit(SparkExitCode.UNCAUGHT_EXCEPTION)}}

 

This code has the _ if exitOnUncaughtException case, but not the other _ cases.  As a result,
when exitOnUncaughtException is false (Master and Worker) and exception doesn't match any
of the match cases (e.g., IllegalStateException), Scala throws MatchError(exception) ("MatchError"
wrapper of the original exception).  Then the other catch block down below thinks we have
another uncaught exception, and halts the entire process with SparkExitCode.UNCAUGHT_EXCEPTION_TWICE.

 

{{catch {}}
{{  case oom: OutOfMemoryError => Runtime.getRuntime.halt(SparkExitCode.OOM)}}
{{  case t: Throwable => Runtime.getRuntime.halt(SparkExitCode.UNCAUGHT_EXCEPTION_TWICE)}}
{{}}}

 

Therefore, even when exitOnUncaughtException is false, the process will halt.


> SparkUncaughtExceptionHandler halts running process unexpectedly
> ----------------------------------------------------------------
>
>                 Key: SPARK-30310
>                 URL: https://issues.apache.org/jira/browse/SPARK-30310
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0, 3.0.0
>            Reporter: Tin Hang To
>            Priority: Major
>
> During 2.4.x testing, we have many occasions where the Worker process would just DEAD
unexpectedly, with the Worker log ends with:
>  
> {{ERROR SparkUncaughtExceptionHandler: scala.MatchError:  <...callstack...>}}
>  
> We get the same callstack during our 2.3.x testing but the Worker process stays up.
> Upon looking at the 2.4.x SparkUncaughtExceptionHandler.scala compared to the 2.3.x version,
 we found out SPARK-24294 introduced the following change:
> {{exception catch {}}
> {{  case _: OutOfMemoryError =>}}
> {{    System.exit(SparkExitCode.OOM)}}
> {{  case e: SparkFatalException if e.throwable.isInstanceOf[OutOfMemoryError] =>}}
> {{    // SPARK-24294: This is defensive code, in case that SparkFatalException is}}
> {{    // misused and uncaught.}}
> {{    System.exit(SparkExitCode.OOM)}}
> {{  case _ if exitOnUncaughtException =>}}
> {{    System.exit(SparkExitCode.UNCAUGHT_EXCEPTION)}}
> {{}}}
>  
> This code has the _ if exitOnUncaughtException case, but not the other _ cases.  As
a result, when exitOnUncaughtException is false (Master and Worker) and exception doesn't
match any of the match cases (e.g., IllegalStateException), Scala throws MatchError(exception)
("MatchError" wrapper of the original exception).  Then the other catch block down below thinks
we have another uncaught exception, and halts the entire process with SparkExitCode.UNCAUGHT_EXCEPTION_TWICE.
>  
> {{catch {}}
> {{  case oom: OutOfMemoryError => Runtime.getRuntime.halt(SparkExitCode.OOM)}}
> {{  case t: Throwable => Runtime.getRuntime.halt(SparkExitCode.UNCAUGHT_EXCEPTION_TWICE)}}
> {{}}}
>  
> Therefore, even when exitOnUncaughtException is false, the process will halt.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message