Hi, is the subquery is user defined sqls or table name in db.
If it is user Defined sql.
Make sure ur partition column is in main select clause.

Sent from Yahoo Mail on Android

On Wed, Oct 25, 2017 at 3:25, Naveen Madhire
<vmadhire@umail.iu.edu> wrote:

Hi,

 

I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance issues.

 

Below is the query I am using,

 

Using Spark 2.0.2

 

val df = spark_session.read.format("jdbc")
.option("driver","oracle.jdbc.OracleDriver")
.option("url", jdbc_url)
   .option("user", user)
   .option("password", pwd)
   .option("dbtable", "subquery")
   .option("partitionColumn", "id")  //primary key column uniformly distributed
   .option("lowerBound", "1")
   .option("upperBound", "500000")
.option("numPartitions", 30)
.load()

 

The above query is running using the 30 partitions, but when I see the UI it is only using 1 partiton to run the query.

 

Can anyone tell if I am missing anything or do I need to anything else to tune the performance of the query.

 Thanks