spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Srinivasa Reddy Tatiredidgari <tatire...@yahoo.com.INVALID>
Subject Re: spark session jdbc performance
Date Wed, 25 Oct 2017 06:26:19 GMT
Hi, is the subquery is user defined sqls or table name in db.If it is user Defined sql.Make
sure ur partition column is in main select clause.

Sent from Yahoo Mail on Android 
 
  On Wed, Oct 25, 2017 at 3:25, Naveen Madhire<vmadhire@umail.iu.edu> wrote:   

Hi,

 

I am trying to fetch data from Oracle DB using a subquery and experiencing lot of performance
issues.

 

Below is the query I am using,

 

Using Spark 2.0.2

 

val df = spark_session.read.format("jdbc")
.option("driver","oracle.jdbc.OracleDriver")
.option("url", jdbc_url)
   .option("user", user)
   .option("password", pwd)
   .option("dbtable", "subquery")
   .option("partitionColumn", "id")  //primary key column uniformly distributed
   .option("lowerBound", "1")
   .option("upperBound", "500000")
.option("numPartitions", 30)
.load()

 

The above query is running using the 30 partitions, but when I see the UI it is only using
1 partiton to run the query.

 

Can anyone tell if I am missing anything or do I need to anything else to tune the performance
of the query.

 Thanks
  

Mime
View raw message