spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ehbhaskar <ehbhas...@gmail.com>
Subject Re: [Spark SQL] INSERT OVERWRITE to a hive partitioned table (pointing to s3) from spark is too slow.
Date Mon, 05 Nov 2018 23:09:35 GMT
Here's code with correct data frame.

self.session = SparkSession \
            .builder \
            .appName(self.app_name) \
            .config("spark.dynamicAllocation.enabled", "false") \
            .config("hive.exec.dynamic.partition.mode", "nonstrict") \
            .config("mapreduce.fileoutputcommitter.algorithm.version", "2")
\
            .config("hive.load.dynamic.partitions.thread", "10") \
            .config("hive.mv.files.thread", "30") \
            .config("fs.trash.interval", "0") \
            .enableHiveSupport()
            
columns_with_default = "col1, NULL as col2, col2, col4, NULL as col5,
partition_col1, partition_col2"
source_data_df_to_write = self.session.sql(
                 "SELECT %s FROM TEMP_VIEW" % (columns_with_default))



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message