sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qian Xu" <sx.a...@googlemail.com>
Subject Review Request 25325: SQOOP-1395: Potential naming conflict in Avro schema
Date Thu, 04 Sep 2014 03:37:56 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for Sqoop.

Bugs: SQOOP-1390

Repository: sqoop-trunk


If you import a table "users". Sqoop will generate an entity class named "users.java". The
class will be compiled, submitted and used by a mapreduce job. If the target file format is
Avro or Parquet, an Avro schema will be generated as well. According to Avro specification,
the entity class is described as "record", the name of the "record" is "users".

For Parquet file format handling, we use the Kite SDK to manage Parquet file reading and writing
with minimal efforts. Kite requires an Avro schema and all data records to be packed into
GenericRecord instances. There will be a problem here. Kite will read the schema first and
try to instantiate a record regarding its name. In this case, Kite will try to instantiate
a "users" class. Unfortunately, there is a "users.java" out there. This will cause mapreduce
job fail. 

The patch proposes to change the "AvroSchemaGenerator" class. Record name will have a prefix.
In this example, the record name of "users.java" will be changed to "sqoop_import_users".


  src/java/org/apache/sqoop/orm/AvroSchemaGenerator.java 806bace 
  src/java/org/apache/sqoop/orm/TableClassName.java 88ab622 

Diff: https://reviews.apache.org/r/25325/diff/


All existing unittests passed. No new unittest is added. 
Manually tested couple of Avro and Parquet import tests successfully.


Qian Xu

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message