sqoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Henriksen <Brian.Henrik...@humedica.com>
Subject Exporting parquet, issues with schema
Date Wed, 09 Dec 2015 20:07:04 GMT
I am trying to use sqoop to export some parquet data to oracle from HDFS.  The first problem
I ran into is that parquet export requires a .metadata directory that is created by a sqoop
parquet IMPORT (Can anyone explain this to me, it seems odd to me that one can only send data
to a database, that you just grabbed from a database).  I got around this by converting a
small subset of my parquet data to text, sqoop export the text to oracle, and then sqoop import
the data back to HDFS as parquet, and with it the .metadata directory.  Here is the error
Im getting:



java.lang.NullPointerException
at java.io.StringReader.<init>(StringReader.java:50)
at org.apache.avro.Schema$Parser.parse(Schema.java:917)
at parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:54)
at parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:142)
at parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:118)
at parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:107)
at org.kitesdk.data.spi.AbstractKeyRecordReaderWrapper.initialize(AbstractKeyRecordReaderWrapper.java:50)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:478)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:671)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroup

This looks like sqoop is getting to the point of starting up the mappers, but they are not
aware of my parquet / avro schema.  Where does sqoop look for these schemas?  As far as I
know, parquet files include the schema within the data files themselves, in addition to this
there is the .metadata directory that contains a .avsc JSON file with the same schema.  Any
ideas?

Mime
View raw message