spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lucas.gary@gmail.com" <lucas.g...@gmail.com>
Subject Re: spark session jdbc performance
Date Wed, 25 Oct 2017 03:04:27 GMT
Sorry, I meant to say: "That code looks SANE to me"

Assuming that you're seeing the query running partitioned as expected then
you're likely configured with one executor.  Very easy to check in the UI.

Gary Lucas

On 24 October 2017 at 16:09, lucas.gary@gmail.com <lucas.gary@gmail.com>
wrote:

> Did you check the query plan / check the UI?
>
> That code looks same to me.  Maybe you've only configured for one executor?
>
> Gary
>
> On Oct 24, 2017 2:55 PM, "Naveen Madhire" <vmadhire@umail.iu.edu> wrote:
>
>>
>> Hi,
>>
>>
>>
>> I am trying to fetch data from Oracle DB using a subquery and
>> experiencing lot of performance issues.
>>
>>
>>
>> Below is the query I am using,
>>
>>
>>
>> *Using Spark 2.0.2*
>>
>>
>>
>> *val *df = spark_session.read.format(*"jdbc"*)
>> .option(*"driver"*,*"*oracle.jdbc.OracleDriver*"*)
>> .option(*"url"*, jdbc_url)
>>    .option(*"user"*, user)
>>    .option(*"password"*, pwd)
>>    .option(*"dbtable"*, *"subquery"*)
>>    .option(*"partitionColumn"*, *"id"*)  //primary key column uniformly
>> distributed
>>    .option(*"lowerBound"*, *"1"*)
>>    .option(*"upperBound"*, *"500000"*)
>> .option(*"numPartitions"*, 30)
>> .load()
>>
>>
>>
>> The above query is running using the 30 partitions, but when I see the UI
>> it is only using 1 partiton to run the query.
>>
>>
>>
>> Can anyone tell if I am missing anything or do I need to anything else to
>> tune the performance of the query.
>>
>>  *Thanks*
>>
>

Mime
View raw message