hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naveen Gangam (JIRA)" <>
Subject [jira] [Commented] (HIVE-12274) Increase width of columns used for general configuration in the metastore.
Date Wed, 08 Mar 2017 03:32:38 GMT


Naveen Gangam commented on HIVE-12274:

I am currently investigating test failures above from the TestPerfCliDriver. All failures
seem to be from difference in the qtest output. I narrowed down the cause. The expected output
is from CBO enabled. The actual output is a result of CBO being disabled because the test
is unable to bulk load data into HMS metastore. This bulk loader uses a little known feature
in derby to import the data from a txt file. Since we are changing the type of TABLE_PARAMS.PARAM_VALUE
to CLOB, there format for the data needs to be different. Looking at the code, the CLOB column
DATA needs to be separated into its own file and the original data file needs to have the
filename, offset and the data length to read from. This is my understanding based on reading
the code from.

I have been able to get past the initial failure, but CBO fails further along without clear
message. [~thejas] Who can I approach to understand these CBO failures? Thanks

> Increase width of columns used for general configuration in the metastore.
> --------------------------------------------------------------------------
>                 Key: HIVE-12274
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Metastore
>    Affects Versions: 2.0.0
>            Reporter: Elliot West
>            Assignee: Naveen Gangam
>              Labels: metastore
>         Attachments: HIVE-12274.2.patch, HIVE-12274.2.patch, HIVE-12274.example.ddl.hql,
HIVE-12274.patch, HIVE-12274.patch, HIVE-12274.patch
> h2. Overview
> This issue is very similar in principle to HIVE-1364. We are hitting a limit when processing
JSON data that has a large nested schema. The struct definition is truncated when inserted
into the metastore database column {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters
in length.
> Given that the purpose of these columns is to hold very loosely defined configuration
values it seems rather limiting to impose such a relatively low length bound. One can imagine
that valid use cases will arise where reasonable parameter/property values exceed the current
> h2. Context
> These limitations were in by the [patch attributed|]
to HIVE-1364 which mentions the _"max length on Oracle 9i/10g/11g"_ as the reason. However,
nowadays the limit can be increased because:
> * Oracle DB's {{varchar2}} supports 32767 bytes now, by setting the configuration parameter
{{MAX_STRING_SIZE}} to {{EXTENDED}}. ([source|])
> * Postgres supports a max of 1GB for {{character}} datatype. ([source|])
> * MySQL can support upto 65535 bytes for the entire row. So long as the {{PARAM_KEY}}
value + {{PARAM_VALUE}} is less than 65535, we should be good. ([source|])
> * SQL Server's {{varchar}} max length is 8000 and can go beyond using "varchar(max)"
with the same limitation as MySQL being 65535 bytes for the entire row. ([source|])
> * Derby's {{varchar}} can be upto 32672 bytes. ([source|])
> h2. Proposal
> Can these columns not use CLOB-like types as for example as used by {{TBLS.VIEW_EXPANDED_TEXT}}?
It would seem that suitable type equivalents exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> After updating the maximum length the metastore database needs to be configured and restarted
with the new settings. Altering {{MAX_STRING_SIZE}} will update database objects and possibly
invalidate them, as follows:
> * Tables with virtual columns will be updated with new data type metadata for virtual
columns of {{VARCHAR2(4000)}}, 4000-byte {{NVARCHAR2}}, or {{RAW(2000)}} type.
> * Functional indexes will become unusable if a change to their associated virtual columns
causes the index key to exceed index key length limits. Attempts to rebuild such indexes will
fail with {{ORA-01450: maximum key length exceeded}}.
> * Views will be invalidated if they contain {{VARCHAR2(4000)}}, 4000-byte {{NVARCHAR2}},
or {{RAW(2000)}} typed expression columns.
> * Materialized views will be updated with new metadata {{VARCHAR2(4000)}}, 4000-byte
{{NVARCHAR2}}, and {{RAW(2000)}} typed expression columns
> * So the limitation could be raised to 32672 bytes, with the caveat that MySQL and SQL
Server limit the row length to 65535 bytes, so that should also be validated to provide consistency.
> Finally, will this limitation persist in the work resulting from HIVE-9452?

This message was sent by Atlassian JIRA

View raw message