hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hive QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-17417) LazySimple Timestamp and Date serialization is very expensive
Date Wed, 06 Sep 2017 08:25:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-17417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16154990#comment-16154990
] 

Hive QA commented on HIVE-17417:
--------------------------------



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12885489/HIVE-17417.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 110 failed/errored test(s), 11026 tests executed
*Failed tests:*
{noformat}
TestAccumuloCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestDummy - did not produce a TEST-*.xml file (likely timed out) (batchId=230)
TestTxnCommandsBase - did not produce a TEST-*.xml file (likely timed out) (batchId=280)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[avro_date] (batchId=9)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_cast] (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_1] (batchId=77)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[date_udf] (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[floor_time] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_non_partitioned] (batchId=19)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_partitioned] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_alt] (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[interval_arithmetic] (batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[json_serde_tsformat] (batchId=5)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_text] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge11] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge5] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge6] (batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge_incompat1] (batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_merge_incompat2] (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[orc_ppd_exception] (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parquet_ppd_timestamp] (batchId=54)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_timestamp2] (batchId=31)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partition_timestamp] (batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp] (batchId=28)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp_1] (batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp_2] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp_formats] (batchId=22)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp_ints_casts] (batchId=1)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp_lazy] (batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamp_literal] (batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[timestamptz_1] (batchId=56)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_from_utc_timestamp] (batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_reflect2] (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[udf_to_utc_timestamp] (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[varchar_cast] (batchId=57)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_aggregate_9] (batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_1] (batchId=15)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_arithmetic] (batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_14] (batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_17] (batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorization_7] (batchId=42)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_casts] (batchId=79)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_date_funcs] (batchId=73)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_timestamp] (batchId=74)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vectorized_timestamp_ints_casts] (batchId=47)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[windowing_windowspec3] (batchId=36)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_handler_bulk] (batchId=97)
org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_timestamp] (batchId=96)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[current_date_timestamp]
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_non_partitioned]
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_partitioned]
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_merge11] (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_merge5] (batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_merge6] (batchId=152)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_merge7] (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_merge_incompat1] (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_merge_incompat2] (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_ppd_timestamp] (batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[orc_split_elimination]
(batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_nonvec_part_all_complex]
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_nonvec_part_all_primitive]
(batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part_all_complex]
(batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_orc_vec_part_all_primitive]
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part]
(batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_complex]
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_part_all_primitive]
(batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_nonvec_table]
(batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part]
(batchId=157)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_complex]
(batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_part_all_primitive]
(batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vec_table]
(batchId=155)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part]
(batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_complex]
(batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_part_all_primitive]
(batchId=158)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[schema_evol_text_vecrow_table]
(batchId=147)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_aggregate_9] (batchId=153)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_1] (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_2] (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_arithmetic]
(batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_partitioned_date_time]
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_14] (batchId=148)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_17] (batchId=163)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorization_7] (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_casts] (batchId=162)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_date_funcs]
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_timestamp]
(batchId=161)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vectorized_timestamp_ints_casts]
(batchId=156)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge5] (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge6] (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge7] (batchId=172)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge_incompat1]
(batchId=171)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge_incompat2]
(batchId=172)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=234)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query64] (batchId=234)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[date_udf] (batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[timestamp_1] (batchId=113)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[timestamp_2] (batchId=101)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[timestamp_lazy] (batchId=124)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_14] (batchId=107)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorization_17] (batchId=138)
org.apache.hadoop.hive.ql.exec.vector.TestVectorSerDeRow.testVectorLazySimpleDeserializeRow
(batchId=275)
org.apache.hadoop.hive.ql.exec.vector.TestVectorSerDeRow.testVectorLazySimpleSerializeRow
(batchId=275)
org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat.testNewInputFormat (batchId=262)
org.apache.hadoop.hive.ql.udf.generic.TestGenericUDFLastDay.testLastDay (batchId=254)
org.apache.hadoop.hive.serde2.lazy.TestLazySimpleFast.testLazyBinarySimpleComplexDepthFour
(batchId=288)
org.apache.hadoop.hive.serde2.lazy.TestLazySimpleFast.testLazyBinarySimpleComplexDepthOne
(batchId=288)
org.apache.hadoop.hive.serde2.lazy.TestLazySimpleFast.testLazyBinarySimplePrimitive (batchId=288)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6686/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6686/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6686/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 110 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12885489 - PreCommit-HIVE-Build

> LazySimple Timestamp and Date serialization is very expensive
> -------------------------------------------------------------
>
>                 Key: HIVE-17417
>                 URL: https://issues.apache.org/jira/browse/HIVE-17417
>             Project: Hive
>          Issue Type: Bug
>          Components: Serializers/Deserializers
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Critical
>         Attachments: date-serialize.png, HIVE-17417.1.patch, HIVE-17417.2.patch, HIVE-17417.3.patch,
timestamp-serialize.png, ts-jmh-perf.png
>
>
> In a specific case where a schema contains array<struct> with timestamp and date
fields (array size >10000). Any access to this column very very expensive in terms of CPU
as most of the time is serialization of timestamp and date. Refer attached profiles. >70%
time spent in serialization + tostring conversions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message