spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominik Hübner <>
Subject Kryo fails to serialise output
Date Fri, 03 Jul 2015 06:44:55 GMT
I have a rather simple avro schema to serialize Tweets (message, username, timestamp).
Kryo and twitter chill are used to do so.

For my dev environment the Spark context is configured as below

val conf: SparkConf = new SparkConf()
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
conf.set("spark.kryo.registrator", "co.feeb.TweetRegistrator”)

Serialization is setup with

override def registerClasses(kryo: Kryo): Unit = {
    kryo.register(classOf[Tweet], AvroSerializer.SpecificRecordBinarySerializer[Tweet])

(This method gets called)

Using this configuration to persist some object fails with
(which seems to be ok as this class is not Serializable)

I used the following code:

val ctx: SparkContext = new SparkContext(conf)
    val tweets: RDD[Tweet] = ctx.parallelize(List(
        new Tweet("a", "b", 1L),
        new Tweet("c", "d", 2L),
        new Tweet("e", "f", 3L)


Using saveAsTextFile works, but persisted files are not binary but JSON

cat /tmp/spark/part-00000
{"username": "a", "text": "b", "timestamp": 1}
{"username": "c", "text": "d", "timestamp": 2}
{"username": "e", "text": "f", "timestamp": 3}

Is this intended behaviour, a configuration issue, avro serialisation not working in local
mode or something else?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message