hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naveen Gangam (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-19250) Schema column definitions inconsistencies in MySQL
Date Mon, 23 Apr 2018 19:37:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-19250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448753#comment-16448753
] 

Naveen Gangam commented on HIVE-19250:
--------------------------------------

Thanks [~aihuaxu] As I was working thru the inconsistencies between the dumps as part of this
jira, I had wondered that as well. Hive schema for MySQL use the latin1 CHARSET for all its
tables. It does not specify theĀ {{collation_name}} at the table level which causes it to
use {{{{latin1_swedish_ci}}}} forĀ {{latin1}} charset. However, the schema overrides this
collation for select columns by specifying it in the column definition of the create table
command. These overrides use \{{latin1_bin}} collation name. So some select columns use {{latin1_bin}}
while the rest of them use the default.{{}}

I am not sure which is better for performance for comparing values or how it was determined
what collation to use and what columns to use it on. I am just trying to make the schema definitions
consistent.

I had assumed that this CHARACTER SET and COLLATION NAMEs have been vetted thru when this
schema was first designed. Changing charsets entirely needs to scoped out separately as we
have to ensure that it works across all DBs.

Hope this helps. Thanks

> Schema column definitions inconsistencies in MySQL
> --------------------------------------------------
>
>                 Key: HIVE-19250
>                 URL: https://issues.apache.org/jira/browse/HIVE-19250
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.1.0
>            Reporter: Naveen Gangam
>            Assignee: Naveen Gangam
>            Priority: Minor
>         Attachments: HIVE-19250.patch
>
>
> There are some inconsistencies in column definitions in MySQL between a schema that was
upgraded to 2.1 (from an older release) vs installing the 2.1.0 schema directly.
> >   `CQ_TBLPROPERTIES` varchar(2048) DEFAULT NULL,
> 117d117
> <   `CQ_TBLPROPERTIES` varchar(2048) DEFAULT NULL,
> 135a136
> >   `CC_TBLPROPERTIES` varchar(2048) DEFAULT NULL,
> 143d143
> <   `CC_TBLPROPERTIES` varchar(2048) DEFAULT NULL,
> 156c156
> <   `CTC_TXNID` bigint(20) DEFAULT NULL,
> ---
> >   `CTC_TXNID` bigint(20) NOT NULL,
> 158c158
> <   `CTC_TABLE` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
> ---
> >   `CTC_TABLE` varchar(256) DEFAULT NULL,
> 476c476
> <   `TBL_NAME` varchar(256) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
> ---
> >   `TBL_NAME` varchar(256) DEFAULT NULL,
> 664c664
> <   KEY `PCS_STATS_IDX` (`DB_NAME`,`TABLE_NAME`,`COLUMN_NAME`,`PARTITION_NAME`),
> ---
> >   KEY `PCS_STATS_IDX` (`DB_NAME`,`TABLE_NAME`,`COLUMN_NAME`,`PARTITION_NAME`) USING
BTREE,
> 768c768
> <   `PARAM_VALUE` mediumtext,
> ---
> >   `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin,
> 814c814
> <   `PARAM_VALUE` mediumtext,
> ---
> >   `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin,
> 934c934
> <   `PARAM_VALUE` mediumtext,
> ---
> >   `PARAM_VALUE` mediumtext CHARACTER SET latin1 COLLATE latin1_bin,
> 1066d1065
> <   `TXN_HEARTBEAT_COUNT` int(11) DEFAULT NULL,
> 1067a1067
> >   `TXN_HEARTBEAT_COUNT` int(11) DEFAULT NULL,
> 1080c1080
> <   `TC_TXNID` bigint(20) DEFAULT NULL,
> ---
> >   `TC_TXNID` bigint(20) NOT NULL,
> 1082c1082
> <   `TC_TABLE` varchar(128) DEFAULT NULL,
> ---
> >   `TC_TABLE` varchar(128) NOT NULL,
> 1084c1084
> <   `TC_OPERATION_TYPE` char(1) DEFAULT NULL,
> ---
> >   `TC_OPERATION_TYPE` char(1) NOT NULL,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message