Hi,

We are using the InsertInto method of dataframe to write into an object store backed hive table in Google cloud. We have observed slowness in this approach.

From the internet, we got to know
Writes to Hive tables in Spark happen in a two-phase manner.
We thought of using saving the data directly in the path and then programmatically adding the partitions and doing a msck repair table to save time in the rename operation. Are there any other elegant ways to implement this so that the FinalCopy step (rename API operation) can be eliminated.
Need suggestions to speed up this write.

Few things to consider:
1. We get old data as well as new data. So there will be new partitions as well as upserts to old partitions.
2. Insert overwrite can happen into static and dynamic partitions.

Looking forward to a solution. 

Regards
Joyan