sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Veena Basavaraj (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SQOOP-1616) Sqoop2: Sqoop data type to Avro data type conversion
Date Fri, 02 Jan 2015 20:22:34 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263213#comment-14263213

Veena Basavaraj commented on SQOOP-1616:

noticed that in the Kite code, the decimal type in sqoop is treated as a string in avro

 case DECIMAL:

      // why string?

      return Schema.Type.STRING;

why not used FIXED type in avro?

The decimal logical type represents an arbitrary-precision signed decimal number of the form
unscaled × 10-scale.

A decimal logical type annotates Avro bytes or fixed types. The byte array must contain the
two's-complement representation of the unscaled integer value in big-endian byte order. The
scale is fixed, and is specified using an attribute.

The following attributes are supported:

scale, a JSON integer representing the scale (optional). If not specified the scale is 0.
precision, a JSON integer representing the (maximum) precision of decimals stored in this
type (required).
For example, the following schema represents decimal numbers with a maximum precision of 4
and a scale of 2:


  "type": "bytes",
  "logicalType": "decimal",
  "precision": 4,
  "scale": 2


Regarding the FIXED POINT and FLOATING POINT, we need to check the byte size and then decide
the type to be LONG

    case FIXED_POINT:

      return Schema.Type.LONG;


      return Schema.Type.DOUBLE;

It should be something like this

      if (((org.apache.sqoop.schema.type.FixedPoint) column).getByteSize() <= Integer.SIZE)

        return Schema.Type.INT;

      } else {

        return Schema.Type.LONG;



      if (((org.apache.sqoop.schema.type.FloatingPoint) column).getByteSize() <= Float.SIZE)

        return Schema.Type.FLOAT;

      } else {

        return Schema.Type.DOUBLE;


SET should be treated as a ARRAY, hence it should be ARRAY type sin avro as well, currently
it is treated as a enum, so not sure if this is right.

UNKNOWN is same as the BINARY/BYTES as far as sqoop is concerned.

also this code is really good, I had missed the UNION part when I coded avro IDF.

if (!column.getNullable()) {

      return Schema.create(type);

    } else {

      List<Schema> union = new ArrayList<Schema>();

      // really good call here



      return Schema.createUnion(union);


> Sqoop2: Sqoop data type to Avro data type conversion
> ----------------------------------------------------
>                 Key: SQOOP-1616
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1616
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: connectors
>            Reporter: Qian Xu
>            Assignee: Veena Basavaraj
>            Priority: Minor
>             Fix For: 1.99.5
> Should add more data type convert support

This message was sent by Atlassian JIRA

View raw message