spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gang Li <lg1173298...@gmail.com>
Subject For the same data source in two SQLs, how to read it once?
Date Wed, 09 Sep 2020 07:42:40 GMT
Hi all,

I ran two Spark SQL, they read the same table, partition, but write to
different tables. Is there any way to merge them into one SQL, and realize
the read data operation is only run once?

Suppose there are two SQL:
-----------------------------------------------------------------------------------------------------------------
INSERT OVERWRITE TABLE spark_input_test2 PARTITION(dt='20200909')
SELECT name, number, age
FROM spark_input_test  WHERE dt='20200908'
-----------------------------------------------------------------------------------------------------------------
INSERT OVERWRITE TABLE spark_input_test1 PARTITION(dt='20200909')
SELECT name, number, sex
FROM spark_input_test  WHERE dt='20200908'
-----------------------------------------------------------------------------------------------------------------

Running these two SQL statements will generate two Physical Plan, and the
data source "spark_input_test" will be read twice. If spark_input_test is
read only once, it will save memory.


Cheers,
Gang Li





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Mime
View raw message