hi, all,


We deploy sparksql in standalone mode without HDFS on 1 machine with 256G RAM and 64 cores.

The spark session props like below:

SparkSession.builder().appName("MYAPP")
                .config("spark.sql.crossJoin.enabled", "true")
                .config("spark.executor.memory", this.memory_limit)
                .config("spark.executor.cores", 2)
                .config("spark.driver.memory", "2g")
                .config("spark.storage.memoryFraction", 0.3)
                .config("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
                .config("spark.executor.extraJavaOptions",
                        "-XX:+UseG1GC -XX:+PrintFlagsFinal -XX:+PrintReferenceGC " +
                                "-verbose:gc -XX:+PrintGCDetails " +
                                "-XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy")
                .master(this.spark_master)
                .getOrCreate();


The MySQL JDBC connection props like below:

       Properties connProp = new Properties();
        connProp.put("driver", "com.mysql.jdbc.Driver");
        connProp.put("useSSL", "false");
        connProp.put("user", this.user);
        connProp.put("password", this.password);



The we register the MySQL table as Dataset :

Dataset<Row> jdbcDF1 = ss.read().jdbc(this.url, "(select * from bigtable) t1", connProp);
jdbcDF1.createOrReplaceTempView("t1");
Dataset<Row> jdbcDF2 = ss.read().jdbc(this.url, "(select * from smalltable) t2", connProp);
jdbcDF2.createOrReplaceTempView("t2"); 
Dataset<Row> result = sparksession.sql("select * from t1, t2 where xxxx");

 

 When run the job, we got the OOM error in our java program:


Lost task 6.0 in stage 1156.0 (TID 16686, 172.16.50.103, executor 5): java.lang.OutOfMemoryError: Java heap space
        at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2213)
        at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1992)
        at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3413)
        at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:471)
        at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3115)
        at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2344)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2739)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2486)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
        at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1966)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:301)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:109)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)


If there something configuration wrong ? how to fix that? will sparksql use disk when memory not enough?