spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benjamin Kim <>
Subject Merging Parquet Files
Date Thu, 22 Dec 2016 22:01:28 GMT
Has anyone tried to merge *.gz.parquet files before? I'm trying to merge them into 1 file after
they are output from Spark. Doing a coalesce(1) on the Spark cluster will not work. It just
does not have the resources to do it. I'm trying to do it using the commandline and not use
Spark. I will use this command in shell script. I tried "hdfs dfs -getmerge", but the file
becomes unreadable by Spark with gzip footer error.

To unsubscribe e-mail:

View raw message