spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From algermissen1971 <algermissen1...@icloud.com>
Subject Spark Streaming and using Swift object store for checkpointing
Date Fri, 10 Jul 2015 21:10:17 GMT
Hi,

initially today when moving my streaming application to the cluster the first time I ran in
to newbie error of using a local file system for checkpointing and the RDD partition count
differences (see exception below).

Having neither HDFS nor S3 (and the Cassandra-Connector not yet supporting checkpointing[1])
I turned to Swift (which is already available in our architecture).

I mounted Swift using cloudfuse[2] I see the checkpoint files on all three cluster nodes -
but still the job fails with the mentioned exception.

I experimented with cloudfuse caching settings but that does not *seem* to help.

Can anyone shed some light on this issue and provide a hint what I might be doing wrong here?

Jan

[1] https://datastax-oss.atlassian.net/browse/SPARKC-13
[2] https://github.com/redbo/cloudfuse



Exception:

org.apache.spark.SparkException: Checkpoint RDD CheckpointRDD[72] at print at App.scala:47(0)
has different number of partitions than original RDD MapPartitionsRDD[70] at updateStateByKey
at App.scala:47(2)
	at org.apache.spark.rdd.RDDCheckpointData.doCheckpoint(RDDCheckpointData.scala:103)
	at org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply$mcV$sp(RDD.scala:1538)
	at org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply(RDD.scala:1535)
	at org.apache.spark.rdd.RDD$$anonfun$doCheckpoint$1.apply(RDD.scala:1535)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
	at org.apache.spark.rdd.RDD.doCheckpoint(RDD.scala:1534)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1735)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1750)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1765)
	at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1272)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
	at org.apac....
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message