kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-3824) Spark - Extract Fact Table Distinct Columns step causes java.lang.OutOfMemoryError: Java heap space
Date Fri, 22 Feb 2019 14:53:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-3824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775224#comment-16775224
] 

ASF GitHub Bot commented on KYLIN-3824:
---------------------------------------

shaofengshi commented on pull request #478: KYLIN-3824
URL: https://github.com/apache/kylin/pull/478
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Spark - Extract Fact Table Distinct Columns step causes java.lang.OutOfMemoryError: Java
heap space
> ---------------------------------------------------------------------------------------------------
>
>                 Key: KYLIN-3824
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3824
>             Project: Kylin
>          Issue Type: Bug
>          Components: Spark Engine
>    Affects Versions: v2.5.0, v2.6.0, v2.5.1, v2.5.2
>         Environment: CentOS 7
> 3 workers and 1 master.
> 4 cpu, 16GB RAM each
>            Reporter: Alexander
>            Assignee: Alexander
>            Priority: Major
>             Fix For: v2.6.1
>
>         Attachments: KYLIN-3824.master.001.patch
>
>
> Try to build huge cube on weak envirment.
> Environment:
> Cluster with 3 nodes.
> Max AM container size - 5GB.
>  
> kylin_intermediate table ~500 files of size started from 4kb up to 300mb.
>  
> When spark job executor take file larger than ~70MB on step mapPartitionsToPair (194)
it got exception:
> 2019-02-21 20:29:40 ERROR SparkUncaughtExceptionHandler:91 - [Container in shutdown]
Uncaught exception in thread Thread[Executor task launch worker for task 1,5,main]
> java.lang.OutOfMemoryError: Java heap space
>  at java.util.Arrays.copyOfRange(Arrays.java:3664)
>  at java.lang.String.<init>(String.java:207)
>  at java.lang.String.substring(String.java:1969)
>  at java.lang.String.split(String.java:2353)
>  at java.lang.String.split(String.java:2422)
>  at org.apache.kylin.engine.spark.SparkUtil$1.call(SparkUtil.java:164)
>  at org.apache.kylin.engine.spark.SparkUtil$1.call(SparkUtil.java:160)
>  at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
>  at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>  at scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31)
>  at com.google.common.collect.Lists.newArrayList(Lists.java:145)
>  at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:313)
>  at org.apache.kylin.engine.spark.SparkFactDistinct$FlatOutputFucntion.call(SparkFactDistinct.java:239)
>  at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
>  at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:186)
>  at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
>  at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:801)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message