spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nira Amit (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-19656) Can't load custom type from avro file to RDD with newAPIHadoopFile
Date Sat, 18 Feb 2017 10:21:44 GMT
Nira Amit created SPARK-19656:
---------------------------------

             Summary: Can't load custom type from avro file to RDD with newAPIHadoopFile
                 Key: SPARK-19656
                 URL: https://issues.apache.org/jira/browse/SPARK-19656
             Project: Spark
          Issue Type: Question
          Components: Java API
    Affects Versions: 2.0.2
            Reporter: Nira Amit


If I understand correctly, in scala it's possible to load custom objects from avro files to
RDDs this way:
{code}
ctx.hadoopFile("/path/to/the/avro/file.avro",
  classOf[AvroInputFormat[MyClassInAvroFile]],
  classOf[AvroWrapper[MyClassInAvroFile]],
  classOf[NullWritable])
{code}
I'm not a scala developer, so I tried to "translate" this to java as best I could. I created
classes that extend AvroKey and FileInputFormat:
{code}
public static class MyCustomAvroKey extends AvroKey<MyCustomClass>{};

public static class MyCustomAvroReader extends AvroRecordReaderBase<MyCustomAvroKey, NullWritable,
MyCustomClass> {
// with my custom schema and all the required methods...
    }
public static class MyCustomInputFormat extends FileInputFormat<MyCustomAvroKey, NullWritable>{

        @Override
        public RecordReader<MyCustomAvroKey, NullWritable> createRecordReader(InputSplit
inputSplit, TaskAttemptContext taskAttemptContext) throws IOException, InterruptedException
{
            return new MyCustomAvroReader();
        }
    }
...
JavaPairRDD<MyCustomAvroKey, NullWritable> records =
                sc.newAPIHadoopFile("file:/path/to/datafile.avro",
                        MyCustomInputFormat.class, MyCustomAvroKey.class,
                        NullWritable.class,
                        sc.hadoopConfiguration());
MyCustomClass first = records.first()._1.datum();
System.out.println("Got a result, some custom field: " + first.getSomeCustomField());
{code}
This compiles fine, but using a debugger I can see that `first._1.datum()` actually returns
a `GenericData$Record` in runtime, not a `MyCustomClass` instance.
And indeed, when the following line executes:
{code}
MyCustomClass first = records.first()._1.datum();
{code}
I get an exception:
{code}
java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to
my.package.containing.MyCustomClass
{code}
Am I doing it wrong? Or is this not possible in Java?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message