spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Spark SQL API taking longer time than DF API.
Date Sun, 31 Mar 2019 12:29:56 GMT
Is the select taking longer or the saving to a file. You seem to only save in the second case
to a file 

> Am 29.03.2019 um 15:10 schrieb neeraj bhadani <bhadani.neeraj.08@gmail.com>:
> 
> Hi Team,
>    I am executing same spark code using the Spark SQL API and DataFrame API, however,
Spark SQL is taking longer than expected.
> 
> PFB Sudo code.
> -----------------------------------------------------------------------------------------------
> Case 1 : Spark SQL
> -----------------------------------------------------------------------------------------------
> %sql
> CREATE TABLE <tbl_name>
> AS
> 
>  WITH <table_1> AS (
>      <qry1>
> )
> ,<table_2> AS (
>      <qry2>
>      )
> 
> SELECT * FROM <table_1> 
> UNION ALL
> SELECT * FROM <table_2>
> 
> -----------------------------------------------------------------------------------------------
> Case  2 : DataFrame API
> -----------------------------------------------------------------------------------------------
> 
> df1 = spark.sql(<qry1>)
> df2 = spark.sql(<qry2>)
> df3 = df1.union(df2)
> df3.write.saveAsTable(<table_name>)
> -----------------------------------------------------------------------------------------------
> 
> As per my understanding, both Spark SQL and DtaaFrame API generate the same code under
the hood and execution time has to be similar.
> 
> Regards,
> Neeraj
> 

Mime
View raw message