kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (Jira)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-4385) KYLIN system cube failing to update table when run on EMR with S3 as storage and EMRFS
Date Wed, 08 Apr 2020 12:13:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078222#comment-17078222
] 

ASF GitHub Bot commented on KYLIN-4385:
---------------------------------------

hit-lacus commented on pull request #1173: KYLIN-4385 HiveSink support object storage
URL: https://github.com/apache/kylin/pull/1173
 
 
   ## Proposed changes
   
   Describe the big picture of your changes here to communicate to the maintainers why we
should accept this pull request. If it fixes a bug or resolves a feature request, be sure
to link to that issue.
   
   ## Types of changes
   
   What types of changes does your code introduce to Kylin?
   _Put an `x` in the boxes that apply_
   
   - [ ] Bugfix (non-breaking change which fixes an issue)
   - [ ] New feature (non-breaking change which adds functionality)
   - [ ] Breaking change (fix or feature that would cause existing functionality to not work
as expected)
   - [ ] Documentation Update (if none of the other choices apply)
   
   ## Checklist
   
   _Put an `x` in the boxes that apply. You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply
a reminder of what we are going to look for before merging your code._
   
   - [ ] I have create an issue on [Kylin's jira](https://issues.apache.org/jira/browse/KYLIN),
and have described the bug/feature there in detail
   - [ ] Commit messages in my PR start with the related jira ID, like "KYLIN-0000 Make Kylin
project open-source"
   - [ ] Compiling and unit tests pass locally with my changes
   - [ ] I have added tests that prove my fix is effective or that my feature works
   - [ ] If this change need a document change, I will prepare another pr against the `document`
branch
   - [ ] Any dependent changes have been merged
   
   ## Further comments
   
   If this is a relatively large or complex change, kick off the discussion at user@kylin
or dev@kylin by explaining why you chose the solution you did and what alternatives you considered,
etc...
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> KYLIN system cube failing to update table when run on EMR with S3 as storage and EMRFS
> --------------------------------------------------------------------------------------
>
>                 Key: KYLIN-4385
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4385
>             Project: Kylin
>          Issue Type: Bug
>            Reporter: raghu ram reddy
>            Assignee: Xiaoxiang Yu
>            Priority: Major
>             Fix For: v3.1.0, v3.0.2, v2.6.6
>
>
>  
> 2020-02-24T15:35:46,548 INFO [metrics-blocking-reservoir-scheduler-0] org.apache.kylin.metrics.lib.impl.hive.HiveReservoirReporter
- Try to write 113 records2020-02-24T15:35:46,566 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.hadoop.hive.conf.HiveConf - Found configuration file file:/etc/hive/conf.dist/hive-site.xml2020-02-24T15:35:47,097
INFO [metrics-blocking-reservoir-scheduler-0] hive.metastore - Trying to connect to metastore
with URI thrift://ip-1-1-1-1.ec2.internal:90832020-02-24T15:35:47,216 INFO [metrics-blocking-reservoir-scheduler-0]
hive.metastore - Opened a connection to metastore, current connections: 12020-02-24T15:35:47,216
INFO [metrics-blocking-reservoir-scheduler-0] hive.metastore - Connected to metastore.2020-02-24T15:35:47,433
INFO [metrics-blocking-reservoir-scheduler-0] hive.metastore - Closed a connection to metastore,
current connections: 02020-02-24T15:35:47,824 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.kylin.metrics.lib.impl.hive.HiveProducer - Try to use new partition content path:
hdfs://ip-1-1-2-1.ec2.internal:8020/tmp/system_cube/hive_metrics_query_cube_qa/kday_date=2020-02-24/ip-1-1-1-1-1582558547056-part-0000
for metric: METRICS_QUERY_CUBE_QA2020-02-24T15:35:47,959 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.kylin.metrics.lib.impl.hive.HiveProducer - Success to write 37 metrics (METRICS_QUERY_CUBE_QA)
to file hdfs://ip-1-1-2-1.ec2.internal:8020/tmp/system_cube/hive_metrics_query_cube_qa/kday_date=2020-02-24/ip-1-1-1-1-1582558547056-part-00002020-02-24T15:35:48,275
INFO [metrics-blocking-reservoir-scheduler-0] hive.metastore - Trying to connect to metastore
with URI thrift://ip-1-1-2-1.ec2.internal:90832020-02-24T15:35:48,288 INFO [metrics-blocking-reservoir-scheduler-0]
hive.metastore - Opened a connection to metastore, current connections: 12020-02-24T15:35:48,289
INFO [metrics-blocking-reservoir-scheduler-0] hive.metastore - Connected to metastore.2020-02-24T15:35:48,711
INFO [metrics-blocking-reservoir-scheduler-0] hive.metastore - Closed a connection to metastore,
current connections: 02020-02-24T15:35:50,223 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.hadoop.hive.ql.session.SessionState - Created HDFS directory: /tmp/hive/kylin/3f98a154-e471-40fc-9829-4c4283266d462020-02-24T15:35:50,224
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.session.SessionState
- Created local directory: /usr/local/kylin/tomcat/temp/kylin/3f98a154-e471-40fc-9829-4c4283266d462020-02-24T15:35:50,232
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.session.SessionState
- Created HDFS directory: /tmp/hive/kylin/3f98a154-e471-40fc-9829-4c4283266d46/_tmp_space.db2020-02-24T15:35:50,291
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.exec.tez.TezSessionState
- User of session id 3f98a154-e471-40fc-9829-4c4283266d46 is kylin2020-02-24T15:35:50,389
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.exec.tez.DagUtils
- Jar dir is null / directory doesn't exist. Choosing HIVE_INSTALL_DIR - /user/kylin/.hiveJars2020-02-24T15:35:50,933
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.exec.tez.DagUtils
- Resource modification time: 1581024148854 for hdfs://ip-1-1-2-1.ec2.internal:8020/user/kylin/.hiveJars/hive-exec-2.3.6-amzn-0-9f4c4d2a9ab8330bfec9b3ce23e40355288cc5c08a20165b20aca86b2b6c2c95.jar2020-02-24T15:35:51,066
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAccessController
- Created SQLStdHiveAccessController for session context : HiveAuthzSessionContext [sessionString=3f98a154-e471-40fc-9829-4c4283266d46,
clientType=HIVECLI]2020-02-24T15:35:51,073 WARN [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.session.SessionState
- METASTORE_FILTER_HOOK will be ignored, since hive.security.authorization.manager is set
to instance of HiveAuthorizerFactory.2020-02-24T15:35:51,646 INFO [metrics-blocking-reservoir-scheduler-0]
hive.metastore - Trying to connect to metastore with URI thrift://ip-1-1-2-1.ec2.internal:90832020-02-24T15:35:51,662
INFO [metrics-blocking-reservoir-scheduler-0] hive.metastore - Opened a connection to metastore,
current connections: 12020-02-24T15:35:51,662 INFO [metrics-blocking-reservoir-scheduler-0]
hive.metastore - Connected to metastore.2020-02-24T15:35:51,992 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.tez.client.TezClient - Tez Client Version: [ component=tez-api, version=0.9.2,
revision=9566b9ed1d86bc2697f1622e4e9825da6c011583, SCM-URL=scm:git:https://gitbox.apache.org/repos/asf/tez.git,
buildTime=2019-10-28T16:32:03Z ]2020-02-24T15:35:51,992 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.hadoop.hive.ql.exec.tez.TezSessionState - Opening new Tez Session (id: 3f98a154-e471-40fc-9829-4c4283266d46,
scratch dir: hdfs://ip-1-1-2-1.ec2.internal:8020/tmp/hive/kylin/_tez_session_dir/3f98a154-e471-40fc-9829-4c4283266d46)2020-02-24T15:35:52,578
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.yarn.client.RMProxy - Connecting
to ResourceManager at ip-1-1-2-1.ec2.internal/10.127.2.141:80322020-02-24T15:35:52,767 INFO
[metrics-blocking-reservoir-scheduler-0] org.apache.tez.client.TezClient - Session mode. Starting
session.2020-02-24T15:35:52,839 INFO [metrics-blocking-reservoir-scheduler-0] org.apache.tez.client.TezClientUtils
- Using tez.lib.uris value from configuration: hdfs:///apps/tez/tez.tar.gz2020-02-24T15:35:52,839
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.tez.client.TezClientUtils - Using
tez.lib.uris.classpath value from configuration: null2020-02-24T15:35:52,871 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.hadoop.hdfs.DFSClient - Created HDFS_DELEGATION_TOKEN token 856 for kylin on 10.127.2.141:80202020-02-24T15:35:53,280
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.tez.common.security.TokenCache -
Got dt for hdfs://ip-1-1-2-1.ec2.internal:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 10.127.2.141:8020,
Ident: (HDFS_DELEGATION_TOKEN token 856 for kylin)2020-02-24T15:35:53,280 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.tez.common.security.TokenCache - Got dt for hdfs://ip-1-1-2-1.ec2.internal:8020;
Kind: kms-dt, Service: 10.127.2.141:9700, Ident: (owner=kylin, renewer=yarn, realUser=, issueDate=1582558553105,
maxDate=1583163353105, sequenceNumber=853, masterKeyId=53)2020-02-24T15:35:53,310 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.tez.client.TezClient - Tez system stage directory hdfs://ip-1-1-2-1.ec2.internal:8020/tmp/hive/kylin/_tez_session_dir/3f98a154-e471-40fc-9829-4c4283266d46/.tez/application_1578089000827_0674
doesn't exist and is created2020-02-24T15:35:54,257 INFO [BadQueryDetector] org.apache.kylin.rest.service.BadQueryDetector
- Detect bad query.2020-02-24T15:35:54,620 INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl
- Timeline service address: http://ip-1-1-2-1.ec2.internal:8188/ws/v1/timeline/2020-02-24T15:35:55,040
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
- Submitted application application_1578089000827_06742020-02-24T15:35:55,041 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.tez.client.TezClient - The url to track the Tez Session: http://ip-1-1-2-1.ec2.internal:20888/proxy/application_1578089000827_0674/2020-02-24T15:35:57,000
INFO [FetcherRunner 1354629870-25] org.apache.kylin.job.impl.threadpool.DefaultFetcherRunner
- Job Fetcher: 0 should running, 0 actual running, 0 stopped, 0 ready, 20 already succeed,
1 error, 0 discarded, 0 others2020-02-24T15:35:59,829 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.hadoop.hive.ql.Driver - Compiling command(queryId=kylin_20200224153550_249ef9e2-5723-403f-8ef9-e1a43de9b661):
ALTER TABLE KYLIN.HIVE_METRICS_QUERY_QA ADD IF NOT EXISTS PARTITION (kday_date='2020-02-24')2020-02-24T15:36:01,467
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver - Semantic
Analysis Completed2020-02-24T15:36:01,471 INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver
- Returning Hive schema: Schema(fieldSchemas:null, properties:null)2020-02-24T15:36:01,485
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver - Completed
compiling command(queryId=kylin_20200224153550_249ef9e2-5723-403f-8ef9-e1a43de9b661); Time
taken: 1.708 seconds2020-02-24T15:36:01,485 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.hadoop.hive.ql.Driver - Concurrency mode is disabled, not creating a lock manager2020-02-24T15:36:01,485
INFO [metrics-blocking-reservoir-scheduler-0] org.apache.hadoop.hive.ql.Driver - Executing
command(queryId=kylin_20200224153550_249ef9e2-5723-403f-8ef9-e1a43de9b661): ALTER TABLE KYLIN.HIVE_METRICS_QUERY_QA
ADD IF NOT EXISTS PARTITION (kday_date='2020-02-24')2020-02-24T15:36:01,506 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.hadoop.hive.ql.Driver - Starting task [Stage-0:DDL] in serial mode2020-02-24T15:36:02,952
INFO [metrics-blocking-reservoir-scheduler-0] hive.ql.metadata.Hive - Dumping metastore api
call timing information for : execution phase2020-02-24T15:36:02,952 INFO [metrics-blocking-reservoir-scheduler-0]
hive.ql.metadata.Hive - Total time spent in this metastore function was greater than 1000ms
: add_partitions_(List, boolean, boolean, )=11912020-02-24T15:36:02,952 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.hadoop.hive.ql.Driver - Completed executing command(queryId=kylin_20200224153550_249ef9e2-5723-403f-8ef9-e1a43de9b661);
Time taken: 1.467 secondsOK2020-02-24T15:36:02,953 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.hadoop.hive.ql.Driver - OK2020-02-24T15:36:02,954 INFO [metrics-blocking-reservoir-scheduler-0]
org.apache.kylin.metrics.lib.impl.hive.HiveProducer - Try to use new partition content path:
s3://my_bucket/warehouse/kylin.db/hive_metrics_query_qa/kday_date=2020-02-24/ip-1-1-1-1-1582558548273-part-0000
for metric: METRICS_QUERY_QA2020-02-24T15:36:03,322 INFO [metrics-blocking-reservoir-scheduler-0]
com.amazon.ws.emr.hadoop.fs.cse.CSEMultipartUploadOutputStream - close closed:false s3://my_bucket/warehouse/kylin.db/hive_metrics_query_qa/kday_date=2020-02-24/ip-1-1-1-1-1582558548273-part-00002020-02-24T15:36:03,847
INFO [metrics-blocking-reservoir-scheduler-0] com.amazon.ws.emr.hadoop.fs.s3.upload.dispatch.DefaultMultipartUploadDispatcher
- Completed multipart upload of 1 parts 0 bytes2020-02-24T15:36:04,203 INFO [metrics-blocking-reservoir-scheduler-0]
com.amazon.ws.emr.hadoop.fs.cse.CSEMultipartUploadOutputStream - Finished uploading my_bucket/warehouse/kylin.db/hive_metrics_query_qa/kday_date=2020-02-24/ip-1-1-1-1-1582558548273-part-0000.
Elapsed seconds: 0.2020-02-24T15:36:04,284 ERROR [metrics-blocking-reservoir-scheduler-0]
org.apache.kylin.metrics.lib.impl.hive.HiveReservoirReporter - nulljava.lang.UnsupportedOperationException
at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.append(S3NativeFileSystem2.java:150)
~[emrfs-hadoop-assembly-2.37.0.jar:?] at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1181)
~[hadoop-common-2.8.5-amzn-5.jar:?] at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.append(EmrFileSystem.java:295)
~[emrfs-hadoop-assembly-2.37.0.jar:?] at org.apache.kylin.metrics.lib.impl.hive.HiveProducer.write(HiveProducer.java:204)
~[kylin-metrics-reporter-hive-3.0.0.jar:3.0.0] at org.apache.kylin.metrics.lib.impl.hive.HiveProducer.send(HiveProducer.java:134)
~[kylin-metrics-reporter-hive-3.0.0.jar:3.0.0] at org.apache.kylin.metrics.lib.impl.hive.HiveReservoirReporter$HiveReservoirListener.onRecordUpdate(HiveReservoirReporter.java:144)
[kylin-metrics-reporter-hive-3.0.0.jar:3.0.0] at org.apache.kylin.metrics.lib.impl.BlockingReservoir.notifyListenerOfUpdatedRecord(BlockingReservoir.java:117)
[kylin-core-metrics-3.0.0.jar:3.0.0] at org.apache.kylin.metrics.lib.impl.BlockingReservoir.onRecordUpdate(BlockingReservoir.java:105)
[kylin-core-metrics-3.0.0.jar:3.0.0] at org.apache.kylin.metrics.lib.impl.BlockingReservoir.access$300(BlockingReservoir.java:37)
[kylin-core-metrics-3.0.0.jar:3.0.0] at org.apache.kylin.metrics.lib.impl.BlockingReservoir$ReporterRunnable.run(BlockingReservoir.java:171)
[kylin-core-metrics-3.0.0.jar:3.0.0] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242]2020-02-24T15:36:04,290
WARN [metrics-blocking-reservoir-scheduler-0] org.apache.kylin.metrics.lib.impl.BlockingReservoir
- It fails to notify listener org.apache.kylin.metrics.lib.impl.hive.HiveReservoirReporter$HiveReservoirListener@1d460286
of updated record size 1132



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message