spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Louis Hust <louis.h...@gmail.com>
Subject OOM when extract big data from MySQL Using JDBC
Date Mon, 02 Apr 2018 04:17:20 GMT
hi, all,


We deploy sparksql in standalone mode without HDFS on 1 machine with 256G
RAM and 64 cores.

The spark session props like below:

SparkSession.builder().appName("MYAPP")
>                 .config("spark.sql.crossJoin.enabled", "true")
>                 .config("spark.executor.memory", this.memory_limit)
>                 .config("spark.executor.cores", 2)
>                 .config("spark.driver.memory", "2g")
>                 .config("spark.storage.memoryFraction", 0.3)
>                 .config("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
>                 .config("spark.executor.extraJavaOptions",
>                         "-XX:+UseG1GC -XX:+PrintFlagsFinal
> -XX:+PrintReferenceGC " +
>                                 "-verbose:gc -XX:+PrintGCDetails " +
>                                 "-XX:+PrintGCTimeStamps
> -XX:+PrintAdaptiveSizePolicy")
>                 .master(this.spark_master)
>                 .getOrCreate();



The MySQL JDBC connection props like below:

       Properties connProp = new Properties();
>         connProp.put("driver", "com.mysql.jdbc.Driver");
>         connProp.put("useSSL", "false");
>         connProp.put("user", this.user);
>         connProp.put("password", this.password);




The we register the MySQL table as Dataset :

Dataset<Row> jdbcDF1 = ss.read().jdbc(this.url, "(select * from bigtable)
> t1", connProp);
> jdbcDF1.createOrReplaceTempView("t1");

Dataset<Row> jdbcDF2 = ss.read().jdbc(this.url, "(select * from smalltable)
> t2", connProp);
> jdbcDF2.createOrReplaceTempView("t2");
> Dataset<Row> result = sparksession.sql("select * from t1, t2 where xxxx");




 When run the job, we got the OOM error in our java program:


> Lost task 6.0 in stage 1156.0 (TID 16686, 172.16.50.103, executor 5):
> java.lang.OutOfMemoryError: Java heap space
>         at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2213)
>         at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1992)
>         at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3413)
>         at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:471)
>         at
> com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3115)
>         at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2344)
>         at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2739)
>         at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2486)
>         at
> com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858)
>         at
> com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1966)
>         at
> org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:301)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>         at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>         at org.apache.spark.scheduler.Task.run(Task.scala:109)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)



If there something configuration wrong ? how to fix that? will sparksql use
disk when memory not enough?

Mime
View raw message