spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Spark hive overwrite is very very slow
Date Sun, 20 Aug 2017 07:24:48 GMT
Have you tried directly in Hive how the performance is? 

In which Format do you expect Hive to write? Have you made sure it is in this format? It could
be that you use an inefficient format (e.g. CSV + bzip2).

> On 20. Aug 2017, at 03:18, KhajaAsmath Mohammed <mdkhajaasmath@gmail.com> wrote:
> 
> Hi,
> 
> I have written spark sql job on spark2.0 by using scala . It is just pulling the data
from hive table and add extra columns , remove duplicates and then write it back to hive again.
> 
> In spark ui, it is taking almost 40 minutes to write 400 go of data. Is there anything
that I need to improve performance .
> 
> Spark.sql.partitions is 2000 in my case with executor memory of 16gb and dynamic allocation
enabled.
> 
> I am doing insert overwrite on partition by
> Da.write.mode(overwrite).insertinto(table)
> 
> Any suggestions please ??
> 
> Sent from my iPhone
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message