spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From scwf <wangf...@huawei.com>
Subject Re: Reproducible deadlock in 1.0.1, possibly related to Spark-1097
Date Tue, 15 Jul 2014 01:49:43 GMT
hi,Cody
   i met this issue days before and i post a PR for this( https://github.com/apache/spark/pull/1385)
it's very strange that if i synchronize conf it will deadlock but it is ok when synchronize
initLocalJobConfFuncOpt


> Here's the entire jstack output.
>
>
> On Mon, Jul 14, 2014 at 4:44 PM, Patrick Wendell <pwendell@gmail.com <mailto:pwendell@gmail.com>>
wrote:
>
>     Hey Cody,
>
>     This Jstack seems truncated, would you mind giving the entire stack
>     trace? For the second thread, for instance, we can't see where the
>     lock is being acquired.
>
>     - Patrick
>
>     On Mon, Jul 14, 2014 at 1:42 PM, Cody Koeninger
>     <cody.koeninger@mediacrossing.com <mailto:cody.koeninger@mediacrossing.com>>
wrote:
>      > Hi all, just wanted to give a heads up that we're seeing a reproducible
>      > deadlock with spark 1.0.1 with 2.3.0-mr1-cdh5.0.2
>      >
>      > If jira is a better place for this, apologies in advance - figured talking
>      > about it on the mailing list was friendlier than randomly (re)opening jira
>      > tickets.
>      >
>      > I know Gary had mentioned some issues with 1.0.1 on the mailing list, once
>      > we got a thread dump I wanted to follow up.
>      >
>      > The thread dump shows the deadlock occurs in the synchronized block of code
>      > that was changed in HadoopRDD.scala, for the Spark-1097 issue
>      >
>      > Relevant portions of the thread dump are summarized below, we can provide
>      > the whole dump if it's useful.
>      >
>      > Found one Java-level deadlock:
>      > =============================
>      > "Executor task launch worker-1":
>      >   waiting to lock monitor 0x00007f250400c520 (object 0x00000000fae7dc30, a
>      > org.apache.hadoop.co <http://org.apache.hadoop.co>
>      > nf.Configuration),
>      >   which is held by "Executor task launch worker-0"
>      > "Executor task launch worker-0":
>      >   waiting to lock monitor 0x00007f2520495620 (object 0x00000000faeb4fc8, a
>      > java.lang.Class),
>      >   which is held by "Executor task launch worker-1"
>      >
>      >
>      > "Executor task launch worker-1":
>      >         at
>      > org.apache.hadoop.conf.Configuration.reloadConfiguration(Configuration.java:791)
>      >         - waiting to lock <0x00000000fae7dc30> (a
>      > org.apache.hadoop.conf.Configuration)
>      >         at
>      > org.apache.hadoop.conf.Configuration.addDefaultResource(Configuration.java:690)
>      >         - locked <0x00000000faca6ff8> (a java.lang.Class for
>      > org.apache.hadoop.conf.Configurati
>      > on)
>      >         at
>      > org.apache.hadoop.hdfs.HdfsConfiguration.<clinit>(HdfsConfiguration.java:34)
>      >         at
>      > org.apache.hadoop.hdfs.DistributedFileSystem.<clinit>(DistributedFileSystem.java:110
>      > )
>      >         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>      > Method)
>      >         at
>      > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.
>      > java:57)
>      >         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
>      > Method)
>      >         at
>      > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.
>      > java:57)
>      >         at
>      > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAcces
>      > sorImpl.java:45)
>      >         at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
>      >         at java.lang.Class.newInstance0(Class.java:374)
>      >         at java.lang.Class.newInstance(Class.java:327)
>      >         at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:373)
>      >         at java.util.ServiceLoader$1.next(ServiceLoader.java:445)
>      >         at
>      > org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2364)
>      >         - locked <0x00000000faeb4fc8> (a java.lang.Class for
>      > org.apache.hadoop.fs.FileSystem)
>      >         at
>      > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
>      >         at
>      > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
>      >         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
>      >         at
>      > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
>      >         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
>      >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
>      >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
>      >         at
>      > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:587)
>      >         at
>      > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:315)
>      >         at
>      > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:288)
>      >         at
>      > org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
>      >         at
>      > org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
>      >         at
>      > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145)
>      >
>      >
>      >
>      > ...elided...
>      >
>      >
>      > "Executor task launch worker-0" daemon prio=10 tid=0x0000000001e71800
>      > nid=0x2d97 waiting for monitor entry [0x00007f24d2bf1000]
>      >    java.lang.Thread.State: BLOCKED (on object monitor)
>      >         at
>      > org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2362)
>      >         - waiting to lock <0x00000000faeb4fc8> (a java.lang.Class for
>      > org.apache.hadoop.fs.FileSystem)
>      >         at
>      > org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
>      >         at
>      > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
>      >         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
>      >         at
>      > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
>      >         at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
>      >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
>      >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
>      >         at
>      > org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:587)
>      >         at
>      > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:315)
>      >         at
>      > org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:288)
>      >         at
>      > org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
>      >         at
>      > org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
>      >         at
>      > org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145)
>
>


-- 

Best Regards
Fei Wang

--------------------------------------------------------------------------------



Mime
View raw message