spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liqingan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-4300) Race condition during SparkWorker shutdown
Date Wed, 01 Aug 2018 10:36:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565104#comment-16565104
] 

liqingan edited comment on SPARK-4300 at 8/1/18 10:35 AM:
----------------------------------------------------------

i feel upset for this issue ! (n)
|Uncaught fatal error from thread [sparkWorker-akka.actor.default-dispatcher-56] shutting
down ActorSystem [sparkWorker]
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.Arrays.copyOfRange(Arrays.java:2694)
	at java.lang.String.<init>(String.java:203)
	at java.lang.StringBuilder.toString(StringBuilder.java:405)
	at java.io.ObjectInputStream$BlockDataInputStream.readUTFBody(ObjectInputStream.java:3068)
	at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:2864)
	at java.io.ObjectInputStream.readString(ObjectInputStream.java:1638)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1341)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
	at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
	at akka.serialization.JavaSerializer.fromBinary(Serializer.scala:136)
	at akka.serialization.Serialization$$anonfun$deserialize$1.apply(Serialization.scala:104)
	at scala.util.Try$.apply(Try.scala:161)
	at akka.serialization.Serialization.deserialize(Serialization.scala:98)
	at akka.remote.MessageSerializer$.deserialize(MessageSerializer.scala:23)
	at akka.remote.DefaultMessageDispatcher.payload$lzycompute$1(Endpoint.scala:55)
	at akka.remote.DefaultMessageDispatcher.payload$1(Endpoint.scala:55)
	at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:73)
	at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:764)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)|
|早上2点12:12.751|ERROR|org.apache.spark.util.logging.FileAppender|Error writing stream
to file /hadoop/var/run/spark/work/app-20180727141925-0019/38075/stderr
java.io.IOException: Stream closed
	at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	at java.io.FilterInputStream.read(FilterInputStream.java:107)
	at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
	at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
	at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
	at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)
	at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)|
|早上2点12:12.752|ERROR|org.apache.spark.util.logging.FileAppender|Error writing stream
to file /hadoop/var/run/spark/work/app-20180727142159-0032/30823/stderr
java.io.IOException: Stream closed
	at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
	at java.io.FilterInputStream.read(FilterInputStream.java:107)
	at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
	at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
	at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
	at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1468)
	at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)|


was (Author: liqingan):
i feel upset for this issue !

 

 

> Race condition during SparkWorker shutdown
> ------------------------------------------
>
>                 Key: SPARK-4300
>                 URL: https://issues.apache.org/jira/browse/SPARK-4300
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>    Affects Versions: 1.1.0
>            Reporter: Alex Liu
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 1.2.2, 1.3.1, 1.4.0
>
>         Attachments: B{~TP2PW~}1TYA2AG{CA41H.png
>
>
> When a shark job is done. there are some error message as following show in the log
> {code}
> INFO 22:10:41,635 SparkMaster: akka.tcp://sparkDriver@ip-172-31-11-204.us-west-1.compute.internal:57641
got disassociated, removing it.
>  INFO 22:10:41,640 SparkMaster: Removing app app-20141106221014-0000
>  INFO 22:10:41,687 SparkMaster: Removing application Shark::ip-172-31-11-204.us-west-1.compute.internal
>  INFO 22:10:41,710 SparkWorker: Asked to kill executor app-20141106221014-0000/0
>  INFO 22:10:41,712 SparkWorker: Runner thread for executor app-20141106221014-0000/0
interrupted
>  INFO 22:10:41,714 SparkWorker: Killing process!
> ERROR 22:10:41,738 SparkWorker: Error writing stream to file /var/lib/spark/work/app-20141106221014-0000/0/stdout
> ERROR 22:10:41,739 SparkWorker: java.io.IOException: Stream closed
> ERROR 22:10:41,739 SparkWorker: 	at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
> ERROR 22:10:41,740 SparkWorker: 	at java.io.BufferedInputStream.read1(BufferedInputStream.java:272)
> ERROR 22:10:41,740 SparkWorker: 	at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> ERROR 22:10:41,740 SparkWorker: 	at java.io.FilterInputStream.read(FilterInputStream.java:107)
> ERROR 22:10:41,741 SparkWorker: 	at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
> ERROR 22:10:41,741 SparkWorker: 	at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
> ERROR 22:10:41,741 SparkWorker: 	at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
> ERROR 22:10:41,742 SparkWorker: 	at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
> ERROR 22:10:41,742 SparkWorker: 	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
> ERROR 22:10:41,742 SparkWorker: 	at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)
>  INFO 22:10:41,838 SparkMaster: Connected to Cassandra cluster: 4299
>  INFO 22:10:41,839 SparkMaster: Adding host 172.31.11.204 (Analytics)
>  INFO 22:10:41,840 SparkMaster: New Cassandra host /172.31.11.204:9042 added
>  INFO 22:10:41,841 SparkMaster: Adding host 172.31.11.204 (Analytics)
>  INFO 22:10:41,842 SparkMaster: Adding host 172.31.11.204 (Analytics)
>  INFO 22:10:41,852 SparkMaster: akka.tcp://sparkDriver@ip-172-31-11-204.us-west-1.compute.internal:57641
got disassociated, removing it.
>  INFO 22:10:41,853 SparkMaster: akka.tcp://sparkDriver@ip-172-31-11-204.us-west-1.compute.internal:57641
got disassociated, removing it.
>  INFO 22:10:41,853 SparkMaster: akka.tcp://sparkDriver@ip-172-31-11-204.us-west-1.compute.internal:57641
got disassociated, removing it.
>  INFO 22:10:41,857 SparkMaster: akka.tcp://sparkDriver@ip-172-31-11-204.us-west-1.compute.internal:57641
got disassociated, removing it.
>  INFO 22:10:41,862 SparkMaster: Adding host 172.31.11.204 (Analytics)
>  WARN 22:10:42,200 SparkMaster: Got status update for unknown executor app-20141106221014-0000/0
>  INFO 22:10:42,211 SparkWorker: Executor app-20141106221014-0000/0 finished with state
KILLED exitStatus 143
> {code}
> /var/lib/spark/work/app-20141106221014-0000/0/stdout is on the disk. It is trying to
write to a close IO stream. 
> Spark worker shuts down by {code}
>  private def killProcess(message: Option[String]) {
>     var exitCode: Option[Int] = None
>     logInfo("Killing process!")
>     process.destroy()
>     process.waitFor()
>     if (stdoutAppender != null) {
>       stdoutAppender.stop()
>     }
>     if (stderrAppender != null) {
>       stderrAppender.stop()
>     }
>     if (process != null) {
>     exitCode = Some(process.waitFor())
>     }
>     worker ! ExecutorStateChanged(appId, execId, state, message, exitCode)
>  
> {code}
> But stdoutAppender concurrently writes to output log file, which creates race condition.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message