spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kali.tummala@gmail.com" <Kali.tumm...@gmail.com>
Subject spark parquet too many small files ?
Date Sat, 02 Jul 2016 00:17:22 GMT
Hi All, 

I am running hive in spark-sql in yarn client mode, the sql is pretty simple
load dynamic partitions to target parquet table.

I used hive configurations parameters such as  (set
hive.merge.smallfiles.avgsize=256000000;set
hive.merge.size.per.task=2560000000;) which usually merges small files to
256mb block size these parameters are supported in spark-sql is there other
way around to merge number of small parquet files to large one.

if its a scala application I can use collasece() function or repartition but
here we are not using spark-scala application its just plain spark-sql.


Thanks
Sri 




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message