spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nira Amit (JIRA)" <>
Subject [jira] [Created] (SPARK-19656) Can't load custom type from avro file to RDD with newAPIHadoopFile
Date Sat, 18 Feb 2017 10:21:44 GMT
Nira Amit created SPARK-19656:

             Summary: Can't load custom type from avro file to RDD with newAPIHadoopFile
                 Key: SPARK-19656
             Project: Spark
          Issue Type: Question
          Components: Java API
    Affects Versions: 2.0.2
            Reporter: Nira Amit

If I understand correctly, in scala it's possible to load custom objects from avro files to
RDDs this way:
I'm not a scala developer, so I tried to "translate" this to java as best I could. I created
classes that extend AvroKey and FileInputFormat:
public static class MyCustomAvroKey extends AvroKey<MyCustomClass>{};

public static class MyCustomAvroReader extends AvroRecordReaderBase<MyCustomAvroKey, NullWritable,
MyCustomClass> {
// with my custom schema and all the required methods...
public static class MyCustomInputFormat extends FileInputFormat<MyCustomAvroKey, NullWritable>{

        public RecordReader<MyCustomAvroKey, NullWritable> createRecordReader(InputSplit
inputSplit, TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException
            return new MyCustomAvroReader();
JavaPairRDD<MyCustomAvroKey, NullWritable> records =
                        MyCustomInputFormat.class, MyCustomAvroKey.class,
MyCustomClass first = records.first()._1.datum();
System.out.println("Got a result, some custom field: " + first.getSomeCustomField());
This compiles fine, but using a debugger I can see that `first._1.datum()` actually returns
a `GenericData$Record` in runtime, not a `MyCustomClass` instance.
And indeed, when the following line executes:
MyCustomClass first = records.first()._1.datum();
I get an exception:
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to
Am I doing it wrong? Or is this not possible in Java?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message