spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From neeraj bhadani <bhadani.neeraj...@gmail.com>
Subject Re: Spark SQL API taking longer time than DF API.
Date Mon, 08 Apr 2019 08:21:41 GMT
Hi All,
    Can anyone help me here with my query?

Regards,
Neeraj

On Mon, Apr 1, 2019 at 9:44 AM neeraj bhadani <bhadani.neeraj.08@gmail.com>
wrote:

> In Both the cases, I am trying to create a HIVE table based on Union on 2
> same queries.
>
> Not sure how internally it differs on the process of creation of HIVE
> table?
>
> Regards,
> Neeraj
>
> On Sun, Mar 31, 2019 at 1:29 PM Jörn Franke <jornfranke@gmail.com> wrote:
>
>> Is the select taking longer or the saving to a file. You seem to only
>> save in the second case to a file
>>
>> Am 29.03.2019 um 15:10 schrieb neeraj bhadani <
>> bhadani.neeraj.08@gmail.com>:
>>
>> Hi Team,
>>    I am executing same spark code using the Spark SQL API and DataFrame
>> API, however, Spark SQL is taking longer than expected.
>>
>> PFB Sudo code.
>>
>> -----------------------------------------------------------------------------------------------
>>
>> Case 1 : Spark SQL
>>
>>
>> -----------------------------------------------------------------------------------------------
>>
>> %sql
>>
>> CREATE TABLE <tbl_name>
>>
>> AS
>>
>>
>>  WITH <table_1> AS (
>>
>>      <qry1>
>>
>> )
>>
>> ,<table_2> AS (
>>
>>      <qry2>
>>
>>      )
>>
>>
>> SELECT * FROM <table_1>
>>
>> UNION ALL
>>
>> SELECT * FROM <table_2>
>>
>>
>>
>> -----------------------------------------------------------------------------------------------
>>
>> Case  2 : DataFrame API
>>
>>
>> -----------------------------------------------------------------------------------------------
>>
>>
>> df1 = spark.sql(<qry1>)
>>
>> df2 = spark.sql(<qry2>)
>>
>> df3 = df1.union(df2)
>>
>> df3.write.saveAsTable(<table_name>)
>>
>>
>> -----------------------------------------------------------------------------------------------
>>
>>
>> As per my understanding, both Spark SQL and DtaaFrame API generate the
>> same code under the hood and execution time has to be similar.
>>
>>
>> Regards,
>>
>> Neeraj
>>
>>
>>

Mime
View raw message