sqoop-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Veena Basavaraj (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SQOOP-1616) Sqoop2: Sqoop data type to Avro data type conversion
Date Fri, 02 Jan 2015 20:23:35 GMT

    [ https://issues.apache.org/jira/browse/SQOOP-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14263213#comment-14263213
] 

Veena Basavaraj edited comment on SQOOP-1616 at 1/2/15 8:22 PM:
----------------------------------------------------------------

noticed that in the Kite code, the decimal type in sqoop is treated as a string in avro
{code}
 case DECIMAL:

      // why string?

      return Schema.Type.STRING;
{code}


why not used FIXED type in avro?

Decimal
The decimal logical type represents an arbitrary-precision signed decimal number of the form
unscaled × 10-scale.

A decimal logical type annotates Avro bytes or fixed types. The byte array must contain the
two's-complement representation of the unscaled integer value in big-endian byte order. The
scale is fixed, and is specified using an attribute.

The following attributes are supported:

scale, a JSON integer representing the scale (optional). If not specified the scale is 0.
precision, a JSON integer representing the (maximum) precision of decimals stored in this
type (required).
For example, the following schema represents decimal numbers with a maximum precision of 4
and a scale of 2:

{code}

{
  "type": "bytes",
  "logicalType": "decimal",
  "precision": 4,
  "scale": 2
}

{code}

Regarding the FIXED POINT and FLOATING POINT, we need to check the byte size and then decide
the type to be LONG

{code}
    case FIXED_POINT:

      return Schema.Type.LONG;

    case FLOATING_POINT:

      return Schema.Type.DOUBLE;

{code}
It should be something like this
{code}
case FIXED_POINT:

      if (((org.apache.sqoop.schema.type.FixedPoint) column).getByteSize() <= Integer.SIZE)
{

        return Schema.Type.INT;

      } else {

        return Schema.Type.LONG;

      }

    case FLOATING_POINT:

      if (((org.apache.sqoop.schema.type.FloatingPoint) column).getByteSize() <= Float.SIZE)
{

        return Schema.Type.FLOAT;

      } else {

        return Schema.Type.DOUBLE;

      }

{code}
SET should be treated as a ARRAY, hence it should be ARRAY type sin avro as well, currently
it is treated as a enum, so not sure if this is right.

UNKNOWN is same as the BINARY/BYTES as far as sqoop is concerned.

also this code is really good, I had missed the UNION part when I coded avro IDF.

{code}
if (!column.getNullable()) {

      return Schema.create(type);

    } else {

      List<Schema> union = new ArrayList<Schema>();

      // really good call here

      union.add(Schema.create(type));

      union.add(Schema.create(Schema.Type.NULL));

      return Schema.createUnion(union);

    }
{code}


was (Author: vybs):
noticed that in the Kite code, the decimal type in sqoop is treated as a string in avro

 case DECIMAL:

      // why string?

      return Schema.Type.STRING;



why not used FIXED type in avro?

Decimal
The decimal logical type represents an arbitrary-precision signed decimal number of the form
unscaled × 10-scale.

A decimal logical type annotates Avro bytes or fixed types. The byte array must contain the
two's-complement representation of the unscaled integer value in big-endian byte order. The
scale is fixed, and is specified using an attribute.

The following attributes are supported:

scale, a JSON integer representing the scale (optional). If not specified the scale is 0.
precision, a JSON integer representing the (maximum) precision of decimals stored in this
type (required).
For example, the following schema represents decimal numbers with a maximum precision of 4
and a scale of 2:

{code}

{
  "type": "bytes",
  "logicalType": "decimal",
  "precision": 4,
  "scale": 2
}

{code}

Regarding the FIXED POINT and FLOATING POINT, we need to check the byte size and then decide
the type to be LONG

    case FIXED_POINT:

      return Schema.Type.LONG;

    case FLOATING_POINT:

      return Schema.Type.DOUBLE;

It should be something like this
{code}
case FIXED_POINT:

      if (((org.apache.sqoop.schema.type.FixedPoint) column).getByteSize() <= Integer.SIZE)
{

        return Schema.Type.INT;

      } else {

        return Schema.Type.LONG;

      }

    case FLOATING_POINT:

      if (((org.apache.sqoop.schema.type.FloatingPoint) column).getByteSize() <= Float.SIZE)
{

        return Schema.Type.FLOAT;

      } else {

        return Schema.Type.DOUBLE;

      }

{code}
SET should be treated as a ARRAY, hence it should be ARRAY type sin avro as well, currently
it is treated as a enum, so not sure if this is right.

UNKNOWN is same as the BINARY/BYTES as far as sqoop is concerned.

also this code is really good, I had missed the UNION part when I coded avro IDF.

{code}
if (!column.getNullable()) {

      return Schema.create(type);

    } else {

      List<Schema> union = new ArrayList<Schema>();

      // really good call here

      union.add(Schema.create(type));

      union.add(Schema.create(Schema.Type.NULL));

      return Schema.createUnion(union);

    }
{code}

> Sqoop2: Sqoop data type to Avro data type conversion
> ----------------------------------------------------
>
>                 Key: SQOOP-1616
>                 URL: https://issues.apache.org/jira/browse/SQOOP-1616
>             Project: Sqoop
>          Issue Type: Sub-task
>          Components: connectors
>            Reporter: Qian Xu
>            Assignee: Veena Basavaraj
>            Priority: Minor
>             Fix For: 1.99.5
>
>
> Should add more data type convert support



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message