spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Starks <>
Subject Parallel read parquet file, write to postgresql
Date Mon, 03 Dec 2018 13:40:41 GMT
Reading Spark doc ( It's
not mentioned how to parallel read parquet file with SparkSession. Would --num-executors just
work? Any additional parameters needed to be added to SparkSession as well?

Also if I want to parallel write data to database, would options 'numPartitions' and 'batchsize'
enough to improve write performance? For example,

                     option("driver", "org.postgresql.Driver").
                     option("url", url).
                     option("dbtable", table_name).
                     option("user", username).
                     option("password", password).
                     option("numPartitions", N) .
                     option("batchsize", M)

From Spark website (,
I only find these two parameters that would have impact  on db write performance.

I appreciate any suggestions.
View raw message