Hi James.

--num-executors is use to control the number of parallel tasks (each per executors) running for your application. For reading and writing data in parallel data partitioning is employed. You can look here for quick intro how data partitioning work: 

You are write that numPartitions is the parameter that could be used to control that though in general spark itself identifies given the data in each stage, how to partition (i.e. how much to parallelize the read and write of data.)

On Mon, Dec 3, 2018 at 8:40 AM James Starks <suserft@protonmail.com.invalid> wrote:
Reading Spark doc (https://spark.apache.org/docs/latest/sql-data-sources-parquet.html). It's not mentioned how to parallel read parquet file with SparkSession. Would --num-executors just work? Any additional parameters needed to be added to SparkSession as well?

Also if I want to parallel write data to database, would options 'numPartitions' and 'batchsize' enough to improve write performance? For example,

                     option("driver", "org.postgresql.Driver").
                     option("url", url).
                     option("dbtable", table_name).
                     option("user", username).
                     option("password", password).
                     option("numPartitions", N) .
                     option("batchsize", M)

From Spark website (https://spark.apache.org/docs/2.2.0/sql-programming-guide.html#jdbc-to-other-databases), I only find these two parameters that would have impact  on db write performance.

I appreciate any suggestions.