spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Cheah <mch...@palantir.com>
Subject [SQL] Write parquet files under partition directories?
Date Tue, 02 Jun 2015 05:21:56 GMT
Hi there,

I noticed in the latest Spark SQL programming guide
<https://spark.apache.org/docs/latest/sql-programming-guide.html> , there is
support for optimized reading of partitioned Parquet files that have a
particular directory structure (year=1/month=10/day=3, for example).
However, I see no analogous way to write DataFrames as Parquet files with
similar directory structures based on user-provided partitioning.

Generally, is it possible to write DataFrames as partitioned Parquet files
that downstream partition discovery can take advantage of later? I
considered extending the Parquet output format, but it looks like
ParquetTableOperations.scala has fixed the output format to
AppendingParquetOutputFormat.

Also, I was wondering if it would be valuable to contribute writing Parquet
in partition directories as a PR.

Thanks,

-Matt Cheah



Mime
View raw message