spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nira Amit (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-19424) Wrong runtime type in RDD when reading from avro with custom serializer
Date Wed, 01 Feb 2017 15:08:51 GMT
Nira Amit created SPARK-19424:
---------------------------------

             Summary: Wrong runtime type in RDD when reading from avro with custom serializer
                 Key: SPARK-19424
                 URL: https://issues.apache.org/jira/browse/SPARK-19424
             Project: Spark
          Issue Type: Bug
          Components: Java API
    Affects Versions: 2.0.2
         Environment: Ubuntu, spark 2.0.2 prebuilt for hadoop 2.7
            Reporter: Nira Amit


I am trying to read data from avro files into an RDD using Kryo. My code compiles fine, but
in runtime I'm getting a ClassCastException. Here is what my code does:
{code}
SparkConf conf = new SparkConf()...
conf.set("spark.serializer", KryoSerializer.class.getCanonicalName());
conf.set("spark.kryo.registrator", MyKryoRegistrator.class.getName());
JavaSparkContext sc = new JavaSparkContext(conf);
{code}
Where MyKryoRegistrator registers a Serializer for MyCustomClass:
{code}
public void registerClasses(Kryo kryo) {
    kryo.register(MyCustomClass.class, new MyCustomClassSerializer());
}
{code}
Then, I read my datafile:
{code}
JavaPairRDD<MyCustomClass, NullWritable> records =
                sc.newAPIHadoopFile("file:/path/to/datafile.avro",
                AvroKeyInputFormat.class, MyCustomClass.class, NullWritable.class,
                sc.hadoopConfiguration());
Tuple2<MyCustomClass, NullWritable> first = records.first();
{code}
This seems to work fine, but using a debugger I can see that while the RDD has a kClassTag
of my.package.containing.MyCustomClass, the variable first contains a Tuple2<AvroKey, NullWritable>,
not Tuple2<MyCustomClass, NullWritable>! And indeed, when the following line executes:
{code}
System.out.println("Got a result, custom field is: " + first._1.getSomeCustomField());
{code}
I get an exception:
{code}
java.lang.ClassCastException: org.apache.avro.mapred.AvroKey cannot be cast to my.package.containing.MyCustomClass
{code}
Am I doing something wrong? And even so, shouldn't I get a compilation error rather than a
runtime error?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message