hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sofia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-21367) Hive returns an incorrect result when using a simple select query
Date Mon, 04 Mar 2019 15:48:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16783485#comment-16783485
] 

Sofia commented on HIVE-21367:
------------------------------

The target table is from two different sources :
 * {color:#333333}*From SQOOP*{color}: when loading tables we use the following code.

{code:java}
sqoop import --connect ${CONNECTION} \
--username ${USER} \
--password ${PASSWORD} \
--table $1 \
--hive-database $2 \
--hive-table ${TBNAME} \
--hive-import \
--as-orcfile \
--hive-overwrite \
-m 1 \
--delete-target-dir 

{code}
 *  *From SPARK*: when processing the data, we store the output as a table in hive using
the following code.
{code:java}
df.write
  .mode(mode)
  .format(HiveWarehouseSession.HIVE_WAREHOUSE_CONNECTOR)
  .option("table",tableName)
  .save(){code}
How do we load the data into the root path of the target table in each case ? 

> Hive returns an incorrect result when using a simple select query
> -----------------------------------------------------------------
>
>                 Key: HIVE-21367
>                 URL: https://issues.apache.org/jira/browse/HIVE-21367
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, HiveServer2, JDBC, SQL
>    Affects Versions: 3.1.0
>         Environment:      - HDP 3.1
>       - Hive 3.1.0
>       - Spark 2.3.2
>       - Sqoop 1.4.7
>            Reporter: LEMBARKI Mohamed Amine
>            Priority: Blocker
>         Attachments: mapred_input_dir_recursive.png
>
>
> Hive returns an incorrect result when using a simple select query with a where clause
>  While with an aggregation it returns a correct result
>  The problem arises for tables created by Spark or Sqoop
> Also when we use spark-shell with HiveWarehouseConnector it returns a correct result
>  
> Workflow: 
>      - Loading data with sqoop to hive
>      - Data processing with spark using HiveWarehouseConnector and Storage to Hive
>   
> below the error log :
>  
>  */-----------------------------------------* 
>  *1 - Executing Query : select code from db1.tbl1 where code = '123'*
>  */-------------------------------------------------*
> {code:java}
> [data@data1 ~]$ hive -e "select code from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:36 [main]: INFO jdbc.HiveConnection: Connected to data2:10000
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2):
select code from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:code, type:string,
comment:null)], properties:null)
> INFO : Completed compiling command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2);
Time taken: 0.142 seconds
> INFO : Executing command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2):
select code from db1.tbl1 where code = '123'
> INFO : Completed executing command(queryId=hive_20190301103129_d48e71f6-a8dd-490e-a574-04d8d4f893e2);
Time taken: 0.003 seconds
> INFO : OK
> +------------------+
> | code |
> +------------------+
> +------------------+
> No rows selected (4,307 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> {code}
> */-----------------------------------------*
> *2 - Executing Query using count :* 
>       *select count(code) from db1.tbl1 where code = '123'*
>  */-------------------------------------------------*
> {code:java}
> [data@data1 ~]$ hive -e "select count(code) from db1.tbl1 where code = '123'"
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:31:56 [main]: INFO jdbc.HiveConnection: Connected to data2:10000
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e):
select count(code) from db1.tbl1 where code = '123'
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:bigint,
comment:null)], properties:null)
> INFO : Completed compiling command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e);
Time taken: 0.166 seconds
> INFO : Executing command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e):
select count(code) from db1.tbl1 where code = '123'
> INFO : Query ID = hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e
> INFO : Total jobs = 1
> INFO : Launching Job 1 out of 1
> INFO : Starting task [Stage-1:MAPRED] in serial mode
> INFO : Subscribed to counters: [] for queryId: hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e
> INFO : Tez session hasn't been created yet. Opening session
> INFO : Dag name: select count(code) f...'123' (Stage-1)
> INFO : Status: Running (Executing on YARN cluster with App id application_1549989271932_2371)
> ----------------------------------------------------------------------------------------------
> VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
> ----------------------------------------------------------------------------------------------
> Map 1 .......... container SUCCEEDED 3 3 0 0 0 0
> Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
> ----------------------------------------------------------------------------------------------
> VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 15,11 s
> ----------------------------------------------------------------------------------------------
> INFO : Status: DAG finished successfully in 7,66 seconds
> INFO :
> INFO : Query Execution Summary
> INFO : ----------------------------------------------------------------------------------------------
> INFO : OPERATION DURATION
> INFO : ----------------------------------------------------------------------------------------------
> INFO : Compile Query 0,17s
> INFO : Prepare Plan 3,33s
> INFO : Get Query Coordinator (AM) 0,00s
> INFO : Submit Plan 0,28s
> INFO : Start DAG 0,88s
> INFO : Run DAG 7,66s
> INFO : ----------------------------------------------------------------------------------------------
> INFO :
> INFO : Task Execution Summary
> INFO : ----------------------------------------------------------------------------------------------
> INFO : VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
> INFO : ----------------------------------------------------------------------------------------------
> INFO : Map 1 6020,00 9,790 261 4,240,134 4
> INFO : Reducer 2 2635,00 590 0 3 0
> INFO : ----------------------------------------------------------------------------------------------
> INFO :
> INFO : org.apache.tez.common.counters.DAGCounter:
> INFO : NUM_SUCCEEDED_TASKS: 4
> INFO : TOTAL_LAUNCHED_TASKS: 4
> INFO : DATA_LOCAL_TASKS: 1
> INFO : RACK_LOCAL_TASKS: 2
> INFO : AM_CPU_MILLISECONDS: 1800
> INFO : AM_GC_TIME_MILLIS: 0
> INFO : File System Counters:
> INFO : FILE_BYTES_READ: 171
> INFO : FILE_BYTES_WRITTEN: 171
> INFO : HDFS_BYTES_READ: 14192292
> INFO : HDFS_BYTES_WRITTEN: 101
> INFO : HDFS_READ_OPS: 1008
> INFO : HDFS_WRITE_OPS: 2
> INFO : HDFS_OP_CREATE: 1
> INFO : HDFS_OP_GET_FILE_STATUS: 204
> INFO : HDFS_OP_OPEN: 603
> INFO : HDFS_OP_RENAME: 1
> INFO : org.apache.tez.common.counters.TaskCounter:
> INFO : SPILLED_RECORDS: 0
> INFO : NUM_SHUFFLED_INPUTS: 3
> INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
> INFO : GC_TIME_MILLIS: 261
> INFO : TASK_DURATION_MILLIS: 10632
> INFO : CPU_MILLISECONDS: 10380
> INFO : PHYSICAL_MEMORY_BYTES: 4202692608
> INFO : VIRTUAL_MEMORY_BYTES: 34204336128
> INFO : COMMITTED_HEAP_BYTES: 4202692608
> INFO : INPUT_RECORDS_PROCESSED: 4145
> INFO : INPUT_SPLIT_LENGTH_BYTES: 217593448
> INFO : OUTPUT_RECORDS: 3
> INFO : OUTPUT_LARGE_RECORDS: 0
> INFO : OUTPUT_BYTES: 9
> INFO : OUTPUT_BYTES_WITH_OVERHEAD: 33
> INFO : OUTPUT_BYTES_PHYSICAL: 147
> INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
> INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
> INFO : ADDITIONAL_SPILL_COUNT: 0
> INFO : SHUFFLE_BYTES: 75
> INFO : SHUFFLE_BYTES_DECOMPRESSED: 33
> INFO : SHUFFLE_BYTES_TO_MEM: 0
> INFO : SHUFFLE_BYTES_TO_DISK: 0
> INFO : SHUFFLE_BYTES_DISK_DIRECT: 75
> INFO : SHUFFLE_PHASE_TIME: 2295
> INFO : FIRST_EVENT_RECEIVED: 219
> INFO : LAST_EVENT_RECEIVED: 2288
> INFO : HIVE:
> INFO : CREATED_FILES: 1
> INFO : DESERIALIZE_ERRORS: 0
> INFO : RECORDS_IN_Map_1: 4240134
> INFO : RECORDS_OUT_0: 1
> INFO : RECORDS_OUT_INTERMEDIATE_Map_1: 4
> INFO : RECORDS_OUT_INTERMEDIATE_Reducer_2: 0
> INFO : RECORDS_OUT_OPERATOR_FIL_10: 1
> INFO : RECORDS_OUT_OPERATOR_FS_15: 1
> INFO : RECORDS_OUT_OPERATOR_GBY_12: 3
> INFO : RECORDS_OUT_OPERATOR_GBY_14: 1
> INFO : RECORDS_OUT_OPERATOR_MAP_0: 0
> INFO : RECORDS_OUT_OPERATOR_RS_13: 4
> INFO : RECORDS_OUT_OPERATOR_SEL_11: 1
> INFO : RECORDS_OUT_OPERATOR_TS_0: 4240134
> INFO : TaskCounter_Map_1_INPUT_tbl1:
> INFO : INPUT_RECORDS_PROCESSED: 4142
> INFO : INPUT_SPLIT_LENGTH_BYTES: 217593448
> INFO : TaskCounter_Map_1_OUTPUT_Reducer_2:
> INFO : ADDITIONAL_SPILLS_BYTES_READ: 0
> INFO : ADDITIONAL_SPILLS_BYTES_WRITTEN: 0
> INFO : ADDITIONAL_SPILL_COUNT: 0
> INFO : OUTPUT_BYTES: 9
> INFO : OUTPUT_BYTES_PHYSICAL: 147
> INFO : OUTPUT_BYTES_WITH_OVERHEAD: 33
> INFO : OUTPUT_LARGE_RECORDS: 0
> INFO : OUTPUT_RECORDS: 3
> INFO : SPILLED_RECORDS: 0
> INFO : TaskCounter_Reducer_2_INPUT_Map_1:
> INFO : FIRST_EVENT_RECEIVED: 219
> INFO : INPUT_RECORDS_PROCESSED: 3
> INFO : LAST_EVENT_RECEIVED: 2288
> INFO : NUM_FAILED_SHUFFLE_INPUTS: 0
> INFO : NUM_SHUFFLED_INPUTS: 3
> INFO : SHUFFLE_BYTES: 75
> INFO : SHUFFLE_BYTES_DECOMPRESSED: 33
> INFO : SHUFFLE_BYTES_DISK_DIRECT: 75
> INFO : SHUFFLE_BYTES_TO_DISK: 0
> INFO : SHUFFLE_BYTES_TO_MEM: 0
> INFO : SHUFFLE_PHASE_TIME: 2295
> INFO : TaskCounter_Reducer_2_OUTPUT_out_Reducer_2:
> INFO : OUTPUT_RECORDS: 0
> INFO : org.apache.hadoop.hive.ql.exec.tez.HiveInputCounters:
> INFO : GROUPED_INPUT_SPLITS_Map_1: 3
> INFO : INPUT_DIRECTORIES_Map_1: 1
> INFO : INPUT_FILES_Map_1: 200
> INFO : RAW_INPUT_SPLITS_Map_1: 201
> INFO : Completed executing command(queryId=hive_20190301103149_90aa338b-b99b-4f1c-b7e5-6b285f64cb3e);
Time taken: 12.218 seconds
> INFO : OK
> +------+
> | _c0 |
> +------+
> | 1 |
> +------+
> 1 row selected (12,458 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> {code}
> */-------------------------------------------------------------*
> *3 - Describe tbl1*
>  */-------------------------------------------------------------*
> {code:java}
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/usr/hdp/3.1.0.0-78/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
> Connecting to jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2
> 19/03/01 10:40:34 [main]: INFO jdbc.HiveConnection: Connected to data2:10000
> Connected to: Apache Hive (version 3.1.0.3.1.0.0-78)
> Driver: Hive JDBC (version 3.1.0.3.1.0.0-78)
> Transaction isolation: TRANSACTION_REPEATABLE_READ
> INFO : Compiling command(queryId=hive_20190301104027_818ae55f-3e3f-4754-8706-0279b693d9a8):
describe extended db1.tbl1
> INFO : Semantic Analysis Completed (retrial = false)
> INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:col_name, type:string,
comment:from deserializer), FieldSchema(name:data_type, type:string, comment:from deserializer),
FieldSchema(name:comment, type:string, comment:from deserializer)], properties:null)
> INFO : Completed compiling command(queryId=hive_20190301104027_818ae55f-3e3f-4754-8706-0279b693d9a8);
Time taken: 0.044 seconds
> INFO : Executing command(queryId=hive_20190301104027_818ae55f-3e3f-4754-8706-0279b693d9a8):
describe extended db1.tbl1
> INFO : Starting task [Stage-0:DDL] in serial mode
> INFO : Completed executing command(queryId=hive_20190301104027_818ae55f-3e3f-4754-8706-0279b693d9a8);
Time taken: 0.024 seconds
> INFO : OK
> +---------------------------------+----------------------------------------------------+----------+
> | col_name | data_type | comment |
> +---------------------------------+----------------------------------------------------+----------+
> | code | string | |
> | | NULL | NULL |
> | Detailed Table Information | Table(tableName:tbl1, dbName:db1, owner:anonymous, createTime:1551431182,
lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:code, type:string,
comment:null)], location:hdfs://data1:8020/warehouse/tablespace/managed/hive/db1.db/tbl1,
inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat,
compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde,
parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[],
skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[],
parameters:{totalSize=217593448, rawDataSize=0, numRows=0, transactional_properties=default,
numFiles=200, transient_lastDdlTime=1551431187, bucketing_version=2, transactional=true},
viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false,
catName:hive, ownerType:USER, writeId:1) | |
> +---------------------------------+----------------------------------------------------+----------+
> 41 rows selected (0,157 seconds)
> Beeline version 3.1.0.3.1.0.0-78 by Apache Hive
> Closing: 0: jdbc:hive2://data2:2181,data1:2181/default;password=data;serviceDiscoveryMode=zooKeeper;user=data;zooKeeperNamespace=hiveserver2{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message