spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lucas.gary@gmail.com" <lucas.g...@gmail.com>
Subject Re: spark session jdbc performance
Date Tue, 24 Oct 2017 23:09:43 GMT
Did you check the query plan / check the UI?

That code looks same to me.  Maybe you've only configured for one executor?

Gary

On Oct 24, 2017 2:55 PM, "Naveen Madhire" <vmadhire@umail.iu.edu> wrote:

>
> Hi,
>
>
>
> I am trying to fetch data from Oracle DB using a subquery and experiencing
> lot of performance issues.
>
>
>
> Below is the query I am using,
>
>
>
> *Using Spark 2.0.2*
>
>
>
> *val *df = spark_session.read.format(*"jdbc"*)
> .option(*"driver"*,*"*oracle.jdbc.OracleDriver*"*)
> .option(*"url"*, jdbc_url)
>    .option(*"user"*, user)
>    .option(*"password"*, pwd)
>    .option(*"dbtable"*, *"subquery"*)
>    .option(*"partitionColumn"*, *"id"*)  //primary key column uniformly
> distributed
>    .option(*"lowerBound"*, *"1"*)
>    .option(*"upperBound"*, *"500000"*)
> .option(*"numPartitions"*, 30)
> .load()
>
>
>
> The above query is running using the 30 partitions, but when I see the UI
> it is only using 1 partiton to run the query.
>
>
>
> Can anyone tell if I am missing anything or do I need to anything else to
> tune the performance of the query.
>
>  *Thanks*
>

Mime
View raw message