hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Cutler (JIRA)" <>
Subject [jira] [Commented] (HIVE-19723) Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
Date Mon, 04 Jun 2018 21:47:00 GMT


Bryan Cutler commented on HIVE-19723:

> My understanding is that since the primary use-case for ArrowUtils is Python integration,
some of the conversions are currently somewhat particular for Python. Perhaps Python/Pandas
only supports MICROSECOND timestamps. 

Python, with pandas and pyarrow, supports timestamps down to nanoseconds.  The reason for
for using microseconds in Spark {{ArrowUtils}} is to match Sparks internal representation,
which is in microseconds.  This way avoids any further conversions once read into the Spark

> Arrow serde: "Unsupported data type: Timestamp(NANOSECOND, null)"
> -----------------------------------------------------------------
>                 Key: HIVE-19723
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.1.0, 4.0.0
>         Attachments: HIVE-19723.1.patch, HIVE-19732.2.patch
> Spark's Arrow support only provides Timestamp at MICROSECOND granularity. Spark 2.3.0
won't accept NANOSECOND. Switch it back to MICROSECOND.
> The unit test org.apache.hive.jdbc.TestJdbcWithMiniLlapArrow will just need to change
the assertion to test microsecond. And we'll need to add this to documentation on supported

This message was sent by Atlassian JIRA

View raw message