spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From neeraj bhadani <bhadani.neeraj...@gmail.com>
Subject Re: Spark SQL API taking longer time than DF API.
Date Mon, 01 Apr 2019 08:44:09 GMT
In Both the cases, I am trying to create a HIVE table based on Union on 2
same queries.

Not sure how internally it differs on the process of creation of HIVE table?

Regards,
Neeraj

On Sun, Mar 31, 2019 at 1:29 PM Jörn Franke <jornfranke@gmail.com> wrote:

> Is the select taking longer or the saving to a file. You seem to only save
> in the second case to a file
>
> Am 29.03.2019 um 15:10 schrieb neeraj bhadani <bhadani.neeraj.08@gmail.com
> >:
>
> Hi Team,
>    I am executing same spark code using the Spark SQL API and DataFrame
> API, however, Spark SQL is taking longer than expected.
>
> PFB Sudo code.
>
> -----------------------------------------------------------------------------------------------
>
> Case 1 : Spark SQL
>
>
> -----------------------------------------------------------------------------------------------
>
> %sql
>
> CREATE TABLE <tbl_name>
>
> AS
>
>
>  WITH <table_1> AS (
>
>      <qry1>
>
> )
>
> ,<table_2> AS (
>
>      <qry2>
>
>      )
>
>
> SELECT * FROM <table_1>
>
> UNION ALL
>
> SELECT * FROM <table_2>
>
>
>
> -----------------------------------------------------------------------------------------------
>
> Case  2 : DataFrame API
>
>
> -----------------------------------------------------------------------------------------------
>
>
> df1 = spark.sql(<qry1>)
>
> df2 = spark.sql(<qry2>)
>
> df3 = df1.union(df2)
>
> df3.write.saveAsTable(<table_name>)
>
>
> -----------------------------------------------------------------------------------------------
>
>
> As per my understanding, both Spark SQL and DtaaFrame API generate the
> same code under the hood and execution time has to be similar.
>
>
> Regards,
>
> Neeraj
>
>
>

Mime
View raw message