spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bin Wang <wbi...@gmail.com>
Subject Re: Checkpoint directory structure
Date Thu, 24 Sep 2015 02:33:17 GMT
I've attached the full log. The error is like this:

15/09/23 17:47:39 ERROR yarn.ApplicationMaster: User class threw exception:
java.lang.IllegalArgumentException: requirement failed: Checkpoint
directory does not exist: hdfs://
szq2.appadhoc.com:8020/user/root/checkpoint/d3714249-e03a-45c7-a0d5-1dc870b7d9f2/rdd-26909
java.lang.IllegalArgumentException: requirement failed: Checkpoint
directory does not exist: hdfs://
szq2.appadhoc.com:8020/user/root/checkpoint/d3714249-e03a-45c7-a0d5-1dc870b7d9f2/rdd-26909
at scala.Predef$.require(Predef.scala:233)
at
org.apache.spark.rdd.ReliableCheckpointRDD.<init>(ReliableCheckpointRDD.scala:45)
at
org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1227)
at
org.apache.spark.SparkContext$$anonfun$checkpointFile$1.apply(SparkContext.scala:1227)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:709)
at org.apache.spark.SparkContext.checkpointFile(SparkContext.scala:1226)
at
org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:112)
at
org.apache.spark.streaming.dstream.DStreamCheckpointData$$anonfun$restore$1.apply(DStreamCheckpointData.scala:109)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at
org.apache.spark.streaming.dstream.DStreamCheckpointData.restore(DStreamCheckpointData.scala:109)
at
org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:487)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at
org.apache.spark.streaming.dstream.DStream$$anonfun$restoreCheckpointData$2.apply(DStream.scala:488)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.streaming.dstream.DStream.restoreCheckpointData(DStream.scala:488)
at
org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
at
org.apache.spark.streaming.DStreamGraph$$anonfun$restoreCheckpointData$2.apply(DStreamGraph.scala:153)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.streaming.DStreamGraph.restoreCheckpointData(DStreamGraph.scala:153)
at
org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:158)
at
org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837)
at
org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:837)
at scala.Option.map(Option.scala:145)
at
org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:837)
at com.appadhoc.data.main.StatCounter$.main(StatCounter.scala:51)
at com.appadhoc.data.main.StatCounter.main(StatCounter.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525)
15/09/23 17:47:39 INFO yarn.ApplicationMaster: Final app status: FAILED,
exitCode: 15, (reason: User class threw exception:
java.lang.IllegalArgumentException: requirement failed: Checkpoint
directory does not exist: hdfs://
szq2.appadhoc.com:8020/user/root/checkpoint/d3714249-e03a-45c7-a0d5-1dc870b7d9f2/rdd-26909
)
15/09/23 17:47:39 INFO spark.SparkContext: Invoking stop() from shutdown
hook


Tathagata Das <tathagata.das1565@gmail.com>于2015年9月24日周四 上午9:45写道:

> Could you provide the logs on when and how you are seeing this error?
>
> On Wed, Sep 23, 2015 at 6:32 PM, Bin Wang <wbin00@gmail.com> wrote:
>
>> BTW, I just kill the application and restart it. Then the application
>> cannot recover from checkpoint because of some lost of RDD. So I'm wonder,
>> if there are some failure in the application, won't it possible not be able
>> to recovery from checkpoint?
>>
>> Bin Wang <wbin00@gmail.com>于2015年9月23日周三 下午6:58写道:
>>
>>> I find the checkpoint directory structure is like this:
>>>
>>> -rw-r--r--   1 root root     134820 2015-09-23 16:55
>>> /user/root/checkpoint/checkpoint-1442998500000
>>> -rw-r--r--   1 root root     134768 2015-09-23 17:00
>>> /user/root/checkpoint/checkpoint-1442998800000
>>> -rw-r--r--   1 root root     134895 2015-09-23 17:05
>>> /user/root/checkpoint/checkpoint-1442999100000
>>> -rw-r--r--   1 root root     134899 2015-09-23 17:10
>>> /user/root/checkpoint/checkpoint-1442999400000
>>> -rw-r--r--   1 root root     134913 2015-09-23 17:15
>>> /user/root/checkpoint/checkpoint-1442999700000
>>> -rw-r--r--   1 root root     134928 2015-09-23 17:20
>>> /user/root/checkpoint/checkpoint-1443000000000
>>> -rw-r--r--   1 root root     134987 2015-09-23 17:25
>>> /user/root/checkpoint/checkpoint-1443000300000
>>> -rw-r--r--   1 root root     134944 2015-09-23 17:30
>>> /user/root/checkpoint/checkpoint-1443000600000
>>> -rw-r--r--   1 root root     134956 2015-09-23 17:35
>>> /user/root/checkpoint/checkpoint-1443000900000
>>> -rw-r--r--   1 root root     135244 2015-09-23 17:40
>>> /user/root/checkpoint/checkpoint-1443001200000
>>> drwxr-xr-x   - root root          0 2015-09-23 18:48
>>> /user/root/checkpoint/d3714249-e03a-45c7-a0d5-1dc870b7d9f2
>>> drwxr-xr-x   - root root          0 2015-09-23 17:44
>>> /user/root/checkpoint/receivedBlockMetadata
>>>
>>>
>>> I restart spark and it reads from
>>> /user/root/checkpoint/d3714249-e03a-45c7-a0d5-1dc870b7d9f2. But it seems
>>> that the data in it lost some rdds so it is not able to recovery. While I
>>> find other directories in checkpoint/, like
>>>  /user/root/checkpoint/checkpoint-1443001200000.  What does it used for?
>>> Can I recovery my data from that?
>>>
>>
>

Mime
View raw message