spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manoj Samel <manojsamelt...@gmail.com>
Subject schemaRDD.saveAsParquetFile creates large number of small parquet files ...
Date Thu, 29 Jan 2015 17:27:49 GMT
Spark 1.2 on Hadoop 2.3

Read one big csv file, create a schemaRDD on it and saveAsParquetFile.

It creates a large number of small (~1MB ) parquet part-x- files.

Any way to control so that smaller number of large files are created ?

Thanks,

Mime
View raw message