hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lev Katzav (Jira)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-23576) Getting partition of type int from metastore sometimes fail on cast error
Date Mon, 01 Jun 2020 04:23:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-23576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Lev Katzav updated HIVE-23576:
------------------------------
    Description: 
+given the following situation:+

there are 2 tables (in db "intpartitionbugtest"), each with a few rows:
 # *test_table_int_1* partitioned by *y* of type *int*
 # *test_table_string_1* partitioned by *x* of type *string*

here is the output of the following query on the metastore db:
{code:sql}
select
	"PARTITIONS"."PART_ID",
	"TBLS"."TBL_NAME",
	"FILTER0"."PART_KEY_VAL",
	"PART_NAME"
from
	"PARTITIONS"
inner join "TBLS" on
	"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
inner join "DBS" on
	"TBLS"."DB_ID" = "DBS"."DB_ID"
inner join "PARTITION_KEY_VALS" "FILTER0" on
	"FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
{code}
 

!image-2020-05-29-14-16-29-356.png!

+the problem+

when running a hive query on the table *test_table_int_1* that filters on *y=1*
 sometimes the following exception will happen on the metastore

 
{code:java}
javax.jdo.JDODataStoreException: Error executing SQL query "select "PARTITIONS"."PART_ID"
from "PARTITIONS"  inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"     and "TBLS"."TBL_NAME"
= ?   inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"      and "DBS"."NAME" = ? inner join
"PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX"
= 0 where "DBS"."CTLG_NAME" = ?  and (((case when "FILTER0"."PART_KEY_VAL" <> ? then
cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = ?))".
 	at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
~[datanucleus-api-jdo-4.2.4.jar:?]
 	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) ~[datanucleus-api-jdo-4.2.4.jar:?]
 	at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267) ~[datanucleus-api-jdo-4.2.4.jar:?]
 	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2003)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:593)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:481)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3853) [hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3843) [hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3577) [hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:3861)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:3516)
[hive-exec-3.1.2.jar:3.1.2]
 	at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) ~[?:?]
 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_112]
 	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
 	at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) [hive-exec-3.1.2.jar:3.1.2]
 	at com.sun.proxy.$Proxy28.getPartitionsByFilter(Unknown Source) [?:?]
 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:5883)
[hive-exec-3.1.2.jar:3.1.2]
 	at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) ~[?:?]
 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_112]
 	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
[hive-exec-3.1.2.jar:3.1.2]
 	at com.sun.proxy.$Proxy30.get_partitions_by_filter(Unknown Source) [?:?]
 	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16234)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16218)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
[hive-exec-3.1.2.jar:3.1.2]
 	at java.security.AccessController.doPrivileged(Native Method) [?:1.8.0_112]
 	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_112]
 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
[hadoop-common-3.2.0.jar:?]
 	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
[hive-exec-3.1.2.jar:3.1.2]
 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
 	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: org.postgresql.util.PSQLException: ERROR: invalid input syntax for type numeric:
"c"
 	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433)
~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2178)
~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306) ~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) ~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) ~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118) ~[postgresql-42.2.2.jar:42.2.2]
 	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
~[HikariCP-2.6.1.jar:?]
 	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
~[HikariCP-2.6.1.jar:?]
 	at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:375)
~[datanucleus-rdbms-4.1.19.jar:?]
 	at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:552)
~[datanucleus-rdbms-4.1.19.jar:?]
 	at org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:645) ~[datanucleus-rdbms-4.1.19.jar:?]
 	at org.datanucleus.store.query.Query.executeQuery(Query.java:1855) ~[datanucleus-core-4.1.17.jar:?]
 	at org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:807) ~[datanucleus-rdbms-4.1.19.jar:?]
 	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368) ~[datanucleus-api-jdo-4.2.4.jar:?]
 	... 35 more
{code}
the query that is failing is generated by: 

org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal

here is the query with the parameters added:
{code:sql}
select
	"PARTITIONS"."PART_ID"
from
	"PARTITIONS"
inner join "TBLS" on
	"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
	and "TBLS"."TBL_NAME" = 'test_table_int_1'
inner join "DBS" on
	"TBLS"."DB_ID" = "DBS"."DB_ID"
	and "DBS"."NAME" = 'intpartitionbugtest'
inner join "PARTITION_KEY_VALS" "FILTER0" on
	"FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
	and "FILTER0"."INTEGER_IDX" = 0
where
"DBS"."CTLG_NAME" = 'hive'
	and
	((
	(
		case
		when "FILTER0"."PART_KEY_VAL" <> '__HIVE_DEFAULT_PARTITION__' then cast("FILTER0"."PART_KEY_VAL"
as decimal(21, 0))
		else null end) = 1))
{code}
 the query is failing on the casting to decimal of the string partition values (a,b,c)

+Analysis of the problem+

this issue doesn't always happen.
 when I start with a clean db, the query runs successfully, but after running many automated
tests that are doing inserts and queries from hive tables, the query breaks

I run *EXPLAIN VERBOSE* on the query when it works and when it breaks:

*working plan:*

!working plan_2.png|width=887,height=277!

looks like first there is an index filtering in "TBL_NAME" that limits the query only to the
correct table,
 and later when there is a filter on "PART_KEY_VAL", all the values are numeric

*broken plan:*

!broken plan_2.png|width=914,height=283!

we can see that first there is filter on "PART_KEY_VAL", and it runs on all the tables, and
fail on the casting of the string columns

my guess is that after some inserts and queries of the relevant tables, the statistics for
those tables change, and postgres decides no to use the index on "TBL_NAME"

+Workaround+

I managed to workaround the issue by setting the following setting on the postgres:
{code:java}
enable_seqscan = false{code}
from here: [https://www.postgresql.org/docs/9.5/runtime-config-query.html]
{quote}_It is impossible to suppress sequential scans entirely, but turning this variable
off discourages the planner from using one if there are other methods available._ 
{quote}
so it will force the use of the index on "TBL_NAME",
 but that is not a good solution for production, as it will hurt performance

 

I think the solution should be to perform a safe cast in the query

  was:
+given the following situation:+

there are 2 tables (in db "intpartitionbugtest"), each with a few rows:
 # *test_table_int_1* partitioned by *y* of type *int*
 # *test_table_string_1* partitioned by *x* of type *string*

here is the output of the following query on the metastore db:
{code:sql}
select
	"PARTITIONS"."PART_ID",
	"TBLS"."TBL_NAME",
	"FILTER0"."PART_KEY_VAL",
	"PART_NAME"
from
	"PARTITIONS"
inner join "TBLS" on
	"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
inner join "DBS" on
	"TBLS"."DB_ID" = "DBS"."DB_ID"
inner join "PARTITION_KEY_VALS" "FILTER0" on
	"FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
{code}
 

!image-2020-05-29-14-16-29-356.png!

+the problem+

when running a hive query on the table *test_table_int_1* that filters on *y=1*
 sometimes the following exception will happen on the metastore

 
{code:java}
javax.jdo.JDODataStoreException: Error executing SQL query "select "PARTITIONS"."PART_ID"
from "PARTITIONS"  inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"     and "TBLS"."TBL_NAME"
= ?   inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"      and "DBS"."NAME" = ? inner join
"PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX"
= 0 where "DBS"."CTLG_NAME" = ?  and (((case when "FILTER0"."PART_KEY_VAL" <> ? then
cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = ?))".
 	at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
~[datanucleus-api-jdo-4.2.4.jar:?]
 	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) ~[datanucleus-api-jdo-4.2.4.jar:?]
 	at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267) ~[datanucleus-api-jdo-4.2.4.jar:?]
 	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2003)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:593)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:481)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3853) [hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3843) [hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3577) [hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:3861)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:3516)
[hive-exec-3.1.2.jar:3.1.2]
 	at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) ~[?:?]
 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_112]
 	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
 	at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) [hive-exec-3.1.2.jar:3.1.2]
 	at com.sun.proxy.$Proxy28.getPartitionsByFilter(Unknown Source) [?:?]
 	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:5883)
[hive-exec-3.1.2.jar:3.1.2]
 	at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) ~[?:?]
 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_112]
 	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
[hive-exec-3.1.2.jar:3.1.2]
 	at com.sun.proxy.$Proxy30.get_partitions_by_filter(Unknown Source) [?:?]
 	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16234)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16218)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
[hive-exec-3.1.2.jar:3.1.2]
 	at java.security.AccessController.doPrivileged(Native Method) [?:1.8.0_112]
 	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_112]
 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
[hadoop-common-3.2.0.jar:?]
 	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
[hive-exec-3.1.2.jar:3.1.2]
 	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
[hive-exec-3.1.2.jar:3.1.2]
 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_112]
 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_112]
 	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
Caused by: org.postgresql.util.PSQLException: ERROR: invalid input syntax for type numeric:
"c"
 	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433)
~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2178)
~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306) ~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) ~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) ~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
~[postgresql-42.2.2.jar:42.2.2]
 	at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118) ~[postgresql-42.2.2.jar:42.2.2]
 	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
~[HikariCP-2.6.1.jar:?]
 	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
~[HikariCP-2.6.1.jar:?]
 	at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:375)
~[datanucleus-rdbms-4.1.19.jar:?]
 	at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:552)
~[datanucleus-rdbms-4.1.19.jar:?]
 	at org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:645) ~[datanucleus-rdbms-4.1.19.jar:?]
 	at org.datanucleus.store.query.Query.executeQuery(Query.java:1855) ~[datanucleus-core-4.1.17.jar:?]
 	at org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:807) ~[datanucleus-rdbms-4.1.19.jar:?]
 	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368) ~[datanucleus-api-jdo-4.2.4.jar:?]
 	... 35 more
{code}
the query that is failing is generated by: 

org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal

here is the query with the parameters added:
{code:sql}
select
	"PARTITIONS"."PART_ID"
from
	"PARTITIONS"
inner join "TBLS" on
	"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
	and "TBLS"."TBL_NAME" = 'test_table_int_1'
inner join "DBS" on
	"TBLS"."DB_ID" = "DBS"."DB_ID"
	and "DBS"."NAME" = 'intpartitionbugtest'
inner join "PARTITION_KEY_VALS" "FILTER0" on
	"FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
	and "FILTER0"."INTEGER_IDX" = 0
where
"DBS"."CTLG_NAME" = 'hive'
	and
	((
	(
		case
		when "FILTER0"."PART_KEY_VAL" <> '__HIVE_DEFAULT_PARTITION__' then cast("FILTER0"."PART_KEY_VAL"
as decimal(21, 0))
		else null end) = 1))
{code}
 the query is failing on the casting to decimal of the string partition values (a,b,c)

+Analysis of the problem+

this issue doesn't always happen.
 when I start with a clean db, the query runs successfully, but after running many automated
tests that are doing inserts and queries from hive tables, the query breaks

I run *EXPLAIN VERBOSE* on the query when it works and when it breaks:

*working plan:*

!working plan_2.png|width=887,height=277!

looks like first there is an index filtering in "TBL_NAME" that limits the query only to the
correct table,
 and later when there is a filter on "PART_KEY_VAL", all the values are numeric

*broken plan:*

!broken plan_2.png|width=914,height=283! we can see that first there is filter on "PART_KEY_VAL",
and it runs on all the tables, and fail on the casting of the string columns

my guess is that after some inserts and queries of the relevant tables, the statistics for
those tables change, and postgres decides no to use the index on "TBL_NAME"

+Workaround+

I managed to workaround the issue by setting the following setting on the postgres:
{code:java}
enable_seqscan = false{code}
from here: [https://www.postgresql.org/docs/9.5/runtime-config-query.html]
{quote}_It is impossible to suppress sequential scans entirely, but turning this variable
off discourages the planner from using one if there are other methods available._ 
{quote}
so it will force the use of the index on "TBL_NAME",
but that is not a good solution for production, as it will hurt performance

 

I think the solution should be to perform a safe cast in the query


> Getting partition of type int from metastore sometimes fail on cast error
> -------------------------------------------------------------------------
>
>                 Key: HIVE-23576
>                 URL: https://issues.apache.org/jira/browse/HIVE-23576
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Standalone Metastore
>    Affects Versions: 3.1.2
>         Environment: metastore db - postgres (tried on 9.3 and 11.5)
>            Reporter: Lev Katzav
>            Priority: Major
>         Attachments: broken plan_2.png, image-2020-05-29-14-16-29-356.png, working plan_2.png
>
>
> +given the following situation:+
> there are 2 tables (in db "intpartitionbugtest"), each with a few rows:
>  # *test_table_int_1* partitioned by *y* of type *int*
>  # *test_table_string_1* partitioned by *x* of type *string*
> here is the output of the following query on the metastore db:
> {code:sql}
> select
> 	"PARTITIONS"."PART_ID",
> 	"TBLS"."TBL_NAME",
> 	"FILTER0"."PART_KEY_VAL",
> 	"PART_NAME"
> from
> 	"PARTITIONS"
> inner join "TBLS" on
> 	"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
> inner join "DBS" on
> 	"TBLS"."DB_ID" = "DBS"."DB_ID"
> inner join "PARTITION_KEY_VALS" "FILTER0" on
> 	"FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
> {code}
>  
> !image-2020-05-29-14-16-29-356.png!
> +the problem+
> when running a hive query on the table *test_table_int_1* that filters on *y=1*
>  sometimes the following exception will happen on the metastore
>  
> {code:java}
> javax.jdo.JDODataStoreException: Error executing SQL query "select "PARTITIONS"."PART_ID"
from "PARTITIONS"  inner join "TBLS" on "PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"     and "TBLS"."TBL_NAME"
= ?   inner join "DBS" on "TBLS"."DB_ID" = "DBS"."DB_ID"      and "DBS"."NAME" = ? inner join
"PARTITION_KEY_VALS" "FILTER0" on "FILTER0"."PART_ID" = "PARTITIONS"."PART_ID" and "FILTER0"."INTEGER_IDX"
= 0 where "DBS"."CTLG_NAME" = ?  and (((case when "FILTER0"."PART_KEY_VAL" <> ? then
cast("FILTER0"."PART_KEY_VAL" as decimal(21,0)) else null end) = ?))".
>  	at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
~[datanucleus-api-jdo-4.2.4.jar:?]
>  	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:391) ~[datanucleus-api-jdo-4.2.4.jar:?]
>  	at org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:267) ~[datanucleus-api-jdo-4.2.4.jar:?]
>  	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.executeWithArray(MetaStoreDirectSql.java:2003)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal(MetaStoreDirectSql.java:593)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilter(MetaStoreDirectSql.java:481)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3853)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.ObjectStore$11.getSqlResult(ObjectStore.java:3843)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:3577)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilterInternal(ObjectStore.java:3861)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByFilter(ObjectStore.java:3516)
[hive-exec-3.1.2.jar:3.1.2]
>  	at sun.reflect.GeneratedMethodAccessor70.invoke(Unknown Source) ~[?:?]
>  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_112]
>  	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
>  	at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:97) [hive-exec-3.1.2.jar:3.1.2]
>  	at com.sun.proxy.$Proxy28.getPartitionsByFilter(Unknown Source) [?:?]
>  	at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_partitions_by_filter(HiveMetaStore.java:5883)
[hive-exec-3.1.2.jar:3.1.2]
>  	at sun.reflect.GeneratedMethodAccessor69.invoke(Unknown Source) ~[?:?]
>  	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_112]
>  	at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
>  	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
[hive-exec-3.1.2.jar:3.1.2]
>  	at com.sun.proxy.$Proxy30.get_partitions_by_filter(Unknown Source) [?:?]
>  	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16234)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_partitions_by_filter.getResult(ThriftHiveMetastore.java:16218)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) [hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) [hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
[hive-exec-3.1.2.jar:3.1.2]
>  	at java.security.AccessController.doPrivileged(Native Method) [?:1.8.0_112]
>  	at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_112]
>  	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
[hadoop-common-3.2.0.jar:?]
>  	at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
[hive-exec-3.1.2.jar:3.1.2]
>  	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
[hive-exec-3.1.2.jar:3.1.2]
>  	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[?:1.8.0_112]
>  	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[?:1.8.0_112]
>  	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_112]
> Caused by: org.postgresql.util.PSQLException: ERROR: invalid input syntax for type numeric:
"c"
>  	at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2433)
~[postgresql-42.2.2.jar:42.2.2]
>  	at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2178)
~[postgresql-42.2.2.jar:42.2.2]
>  	at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:306) ~[postgresql-42.2.2.jar:42.2.2]
>  	at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:441) ~[postgresql-42.2.2.jar:42.2.2]
>  	at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:365) ~[postgresql-42.2.2.jar:42.2.2]
>  	at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:155)
~[postgresql-42.2.2.jar:42.2.2]
>  	at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:118)
~[postgresql-42.2.2.jar:42.2.2]
>  	at com.zaxxer.hikari.pool.ProxyPreparedStatement.executeQuery(ProxyPreparedStatement.java:52)
~[HikariCP-2.6.1.jar:?]
>  	at com.zaxxer.hikari.pool.HikariProxyPreparedStatement.executeQuery(HikariProxyPreparedStatement.java)
~[HikariCP-2.6.1.jar:?]
>  	at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:375)
~[datanucleus-rdbms-4.1.19.jar:?]
>  	at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:552)
~[datanucleus-rdbms-4.1.19.jar:?]
>  	at org.datanucleus.store.rdbms.query.SQLQuery.performExecute(SQLQuery.java:645) ~[datanucleus-rdbms-4.1.19.jar:?]
>  	at org.datanucleus.store.query.Query.executeQuery(Query.java:1855) ~[datanucleus-core-4.1.17.jar:?]
>  	at org.datanucleus.store.rdbms.query.SQLQuery.executeWithArray(SQLQuery.java:807) ~[datanucleus-rdbms-4.1.19.jar:?]
>  	at org.datanucleus.api.jdo.JDOQuery.executeInternal(JDOQuery.java:368) ~[datanucleus-api-jdo-4.2.4.jar:?]
>  	... 35 more
> {code}
> the query that is failing is generated by: 
> org.apache.hadoop.hive.metastore.MetaStoreDirectSql.getPartitionsViaSqlFilterInternal
> here is the query with the parameters added:
> {code:sql}
> select
> 	"PARTITIONS"."PART_ID"
> from
> 	"PARTITIONS"
> inner join "TBLS" on
> 	"PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID"
> 	and "TBLS"."TBL_NAME" = 'test_table_int_1'
> inner join "DBS" on
> 	"TBLS"."DB_ID" = "DBS"."DB_ID"
> 	and "DBS"."NAME" = 'intpartitionbugtest'
> inner join "PARTITION_KEY_VALS" "FILTER0" on
> 	"FILTER0"."PART_ID" = "PARTITIONS"."PART_ID"
> 	and "FILTER0"."INTEGER_IDX" = 0
> where
> "DBS"."CTLG_NAME" = 'hive'
> 	and
> 	((
> 	(
> 		case
> 		when "FILTER0"."PART_KEY_VAL" <> '__HIVE_DEFAULT_PARTITION__' then cast("FILTER0"."PART_KEY_VAL"
as decimal(21, 0))
> 		else null end) = 1))
> {code}
>  the query is failing on the casting to decimal of the string partition values (a,b,c)
> +Analysis of the problem+
> this issue doesn't always happen.
>  when I start with a clean db, the query runs successfully, but after running many automated
tests that are doing inserts and queries from hive tables, the query breaks
> I run *EXPLAIN VERBOSE* on the query when it works and when it breaks:
> *working plan:*
> !working plan_2.png|width=887,height=277!
> looks like first there is an index filtering in "TBL_NAME" that limits the query only
to the correct table,
>  and later when there is a filter on "PART_KEY_VAL", all the values are numeric
> *broken plan:*
> !broken plan_2.png|width=914,height=283!
> we can see that first there is filter on "PART_KEY_VAL", and it runs on all the tables,
and fail on the casting of the string columns
> my guess is that after some inserts and queries of the relevant tables, the statistics
for those tables change, and postgres decides no to use the index on "TBL_NAME"
> +Workaround+
> I managed to workaround the issue by setting the following setting on the postgres:
> {code:java}
> enable_seqscan = false{code}
> from here: [https://www.postgresql.org/docs/9.5/runtime-config-query.html]
> {quote}_It is impossible to suppress sequential scans entirely, but turning this variable
off discourages the planner from using one if there are other methods available._ 
> {quote}
> so it will force the use of the index on "TBL_NAME",
>  but that is not a good solution for production, as it will hurt performance
>  
> I think the solution should be to perform a safe cast in the query



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message