spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "" <>
Subject Re: Best way to merge final output part files created by Spark job
Date Fri, 01 Jul 2016 23:36:00 GMT
Try using collasece function to repartition to desired number of partitions
files, to merge already output files use hive and insert overwrite table
using below options.

set hive.merge.smallfiles.avgsize=2560000;
set hive.merge.size.per.task=2560000;

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe e-mail:

View raw message