kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davide Malagoli (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KYLIN-3807) Error during sample_cube build "Build Dimension Dictionary"
Date Sun, 10 Feb 2019 15:53:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764456#comment-16764456
] 

Davide Malagoli edited comment on KYLIN-3807 at 2/10/19 3:52 PM:
-----------------------------------------------------------------

Additional information

-----------------------------------------------------------------

 I'm trying to replace hive with impala, so I modified the example script to modify the example
tables' schema.

The column using the 'date' type are using the type 'timestamp' to be compatible with impala.

The attached archive contains the modified files, and also a Dockerfile and a docker-compose.yml
that could be used for test purposes.

 

In the kylin.properties file I added the lines

kylin.source.hive.redistribute-flat-table=false
 kylin.source.hive.flat-table-storage-format=TEXTFILE
 kylin.source.hive.flat-table-field-delimiter=,

to change the flat table format as described in https://issues.apache.org/jira/browse/KYLIN-3070 for
Impala compatibility

 

To reproduce the problem

------------------------------------
 * extract the archive
 * docker-compose up -d kylin
 * login in minio on localhost:9000 and create the bucket 'ict-group' (access key and secret
key are in the docker-compose.yml)
 * run $KYLIN_HOME/bin/sample.sh to build the example
 * edit sample_cube to use spark instead mapreduce for cube build and save
 * try to build the sample cube

 

Workaround

---------------------------------

using hive instead of impala, and using the original sample.sh work fine


was (Author: darkice01):
Additional information

-----------------------------------------------------------------

 I'm trying to replace hive with impala, so I modified the example script to modify the example
tables' schema.

The column using the 'date' type are using the type 'timestamp' to be compatible with impala.

The attached archive contains the modified files, and also a Dockerfile and a docker-compose.yml
that could be used for test purposes.

 

In the kylin.properties file I added the lines

kylin.source.hive.redistribute-flat-table=false
 kylin.source.hive.flat-table-storage-format=TEXTFILE
 kylin.source.hive.flat-table-field-delimiter=,

to change the flat table format as described in https://issues.apache.org/jira/browse/KYLIN-3070 for
Impala compatibility

 

To reproduce the problem

------------------------------------
 * extract the archive
 * docker-compose up -d kylin
 * run $KYLIN_HOME/bin/sample.sh to build the example
 * try to build the sample cube

 

Workaround

---------------------------------

using hive instead of impala, and using the original sample.sh work fine

> Error during sample_cube build "Build Dimension Dictionary"
> -----------------------------------------------------------
>
>                 Key: KYLIN-3807
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3807
>             Project: Kylin
>          Issue Type: Bug
>         Environment: kylin 2.5.2-cdh60
> impala 3.0.0-cdh6.0.1
> spark version 2.2.0-cdh6.0.1
>            Reporter: Davide Malagoli
>            Priority: Major
>         Attachments: kylin-compose.zip
>
>
> It seems that a duplicate key value is found [null].
> But there are no null in that four fields in my table, could it be a wrong interpretation
of the "timestamp" columns?
>  
> org.apache.kylin.engine.mr.exception.HadoopShellException: java.lang.RuntimeException:
Checking snapshot of TableRef[KYLIN_CAL_DT] failed.
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:103)
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:50)
>  at org.apache.kylin.engine.mr.steps.CreateDictionaryJob.run(CreateDictionaryJob.java:73)
>  at org.apache.kylin.engine.mr.MRUtil.runMRJob(MRUtil.java:92)
>  at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:63)
>  at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:164)
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:70)
>  at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:164)
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  Caused by: java.lang.IllegalStateException: The table: KYLIN_CAL_DT Dup key found, key=[null],
value1=[null,null,null,null,null,0,-1,-3,-15,-103,0,-1,-4,-15,-15,41501,228,47,16,6,33,5928,2,8,1364,3,455,114,2012-12-31,2012-09-30,2012-08-31,2012-08-17,16-Aug-2012,Aug
16th 2012,Fri 08-16-13,1,0,0,0,2012-06-21,365,92,31,7,2012-12-30,2012-06-30,2012-07-28,2012-08-11,2012-08-12,2012-08-16,Fri
,2012M08,Aug-2012,N,2012M08 ,N,Year 2012 - Quarter 03,2012Q03 ,N,33,2012,2012-08-11,2012-08-17,N,Wk.33
- 13,2012-08-11 00:00:00,2012-08-17 00:00:00,2012W33 ,2012W33 ,08/11/13 - 08/17/13,08/11 -
08/17,2012,N,2012-08-16,2011-08-16,2012-05-16,2012-02-16,2012-07-16,2012-06-16,2012-08-09,2012-08-02,0,0,0,0,0,0,0,0,8,3,33,3,1,1,1,2005-09-07,USER_X
,2012-11-27 00:16:56,USER_X], value2=[null,null,null,null,null,0,-3,-10,-47,-328,0,-3,-11,-47,-47,41276,3,3,3,5,1,5896,1,1,1357,1,453,114,2012-12-31,2012-03-31,2012-01-31,2012-01-05,03-Jan-2012,Jan
3rd 2012,Thu 01-03-13,1,0,0,0,2012-12-21,365,90,31,5,2012-12-30,2012-12-30,2012-12-30,2012-12-30,2012-12-31,2012-01-03,Thu
,2012M01,Jan-2012,N,2012M01 ,N,Year 2012 - Quarter 01,2012Q01 ,N,1,2012,2012-12-30,2012-01-05,N,Wk.01
- 13,2012-01-01 00:00:00,2012-01-05 00:00:00,2012W01 ,2012W01 ,01/01/13 - 01/05/13,01/01 -
01/05,2012,N,2012-01-03,2011-01-03,2012-10-03,2012-07-03,2012-12-03,2012-11-03,2012-12-27,2012-12-20,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,2005-09-07,USER_X
,2012-11-27 00:16:56,USER_X]
>  at org.apache.kylin.dict.lookup.LookupTable.initRow(LookupTable.java:86)
>  at org.apache.kylin.dict.lookup.LookupTable.init(LookupTable.java:69)
>  at org.apache.kylin.dict.lookup.LookupStringTable.init(LookupStringTable.java:80)
>  at org.apache.kylin.dict.lookup.LookupTable.<init>(LookupTable.java:57)
>  at org.apache.kylin.dict.lookup.LookupStringTable.<init>(LookupStringTable.java:66)
>  at org.apache.kylin.dict.lookup.LookupProviderFactory.getInMemLookupTable(LookupProviderFactory.java:63)
>  at org.apache.kylin.cube.CubeManager.getInMemLookupTable(CubeManager.java:481)
>  at org.apache.kylin.cube.CubeManager.getLookupTable(CubeManager.java:467)
>  at org.apache.kylin.cube.cli.DictionaryGeneratorCLI.processSegment(DictionaryGeneratorCLI.java:101)
>  ... 11 more
> result code:2
>  at org.apache.kylin.engine.mr.common.HadoopShellExecutable.doWork(HadoopShellExecutable.java:73)
>  at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:164)
>  at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:70)
>  at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:164)
>  at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:113)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  
> table schema
> Query: describe KYLIN_CAL_DT
> +------------------------------+-----------+--------------------+
> | name | type | comment |
> +------------------------------+-----------+--------------------+
> | cal_dt | timestamp | Date, PK |
> | year_beg_dt | timestamp | YEAR Begin Date |
> | qtr_beg_dt | timestamp | Quarter Begin Date |
> | month_beg_dt | timestamp | Month Begin Date |
> | week_beg_dt | timestamp | Week Begin Date |
> | age_for_year_id | smallint | |
> | age_for_qtr_id | smallint | |
> | age_for_month_id | smallint | |
> | age_for_week_id | smallint | |
> | age_for_dt_id | smallint | |
> | age_for_rtl_year_id | smallint | |
> | age_for_rtl_qtr_id | smallint | |
> | age_for_rtl_month_id | smallint | |
> | age_for_rtl_week_id | smallint | |
> | age_for_cs_week_id | smallint | |
> | day_of_cal_id | int | |
> | day_of_year_id | smallint | |
> | day_of_qtr_id | smallint | |
> | day_of_month_id | smallint | |
> | day_of_week_id | int | |
> | week_of_year_id | tinyint | |
> | week_of_cal_id | int | |
> | month_of_qtr_id | tinyint | |
> | month_of_year_id | tinyint | |
> | month_of_cal_id | smallint | |
> | qtr_of_year_id | tinyint | |
> | qtr_of_cal_id | smallint | |
> | year_of_cal_id | smallint | |
> | year_end_dt | string | |
> | qtr_end_dt | string | |
> | month_end_dt | string | |
> | week_end_dt | string | |
> | cal_dt_name | string | |
> | cal_dt_desc | string | |
> | cal_dt_short_name | string | |
> | ytd_yn_id | tinyint | |
> | qtd_yn_id | tinyint | |
> | mtd_yn_id | tinyint | |
> | wtd_yn_id | tinyint | |
> | season_beg_dt | string | |
> | day_in_year_count | smallint | |
> | day_in_qtr_count | tinyint | |
> | day_in_month_count | tinyint | |
> | day_in_week_count | tinyint | |
> | rtl_year_beg_dt | string | |
> | rtl_qtr_beg_dt | string | |
> | rtl_month_beg_dt | string | |
> | rtl_week_beg_dt | string | |
> | cs_week_beg_dt | string | |
> | cal_date | string | |
> | day_of_week | string | |
> | month_id | string | |
> | prd_desc | string | |
> | prd_flag | string | |
> | prd_id | string | |
> | prd_ind | string | |
> | qtr_desc | string | |
> | qtr_id | string | |
> | qtr_ind | string | |
> | retail_week | string | |
> | retail_year | string | |
> | retail_start_date | string | |
> | retail_wk_end_date | string | |
> | week_ind | string | |
> | week_num_desc | string | |
> | week_beg_date | string | |
> | week_end_date | string | |
> | week_in_year_id | string | |
> | week_id | string | |
> | week_beg_end_desc_mdy | string | |
> | week_beg_end_desc_md | string | |
> | year_id | string | |
> | year_ind | string | |
> | cal_dt_mns_1year_dt | string | |
> | cal_dt_mns_2year_dt | string | |
> | cal_dt_mns_1qtr_dt | string | |
> | cal_dt_mns_2qtr_dt | string | |
> | cal_dt_mns_1month_dt | string | |
> | cal_dt_mns_2month_dt | string | |
> | cal_dt_mns_1week_dt | string | |
> | cal_dt_mns_2week_dt | string | |
> | curr_cal_dt_mns_1year_yn_id | tinyint | |
> | curr_cal_dt_mns_2year_yn_id | tinyint | |
> | curr_cal_dt_mns_1qtr_yn_id | tinyint | |
> | curr_cal_dt_mns_2qtr_yn_id | tinyint | |
> | curr_cal_dt_mns_1month_yn_id | tinyint | |
> | curr_cal_dt_mns_2month_yn_id | tinyint | |
> | curr_cal_dt_mns_1week_yn_ind | tinyint | |
> | curr_cal_dt_mns_2week_yn_ind | tinyint | |
> | rtl_month_of_rtl_year_id | string | |
> | rtl_qtr_of_rtl_year_id | tinyint | |
> | rtl_week_of_rtl_year_id | tinyint | |
> | season_of_year_id | tinyint | |
> | ytm_yn_id | tinyint | |
> | ytq_yn_id | tinyint | |
> | ytw_yn_id | tinyint | |
> | kylin_cal_dt_cre_date | string | |
> | kylin_cal_dt_cre_user | string | |
> | kylin_cal_dt_upd_date | string | |
> | kylin_cal_dt_upd_user | string | |
> +------------------------------+-----------+--------------------+
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message