spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Imran Akbar <skunkw...@gmail.com>
Subject writing partitioned parquet files
Date Fri, 01 Apr 2016 19:10:19 GMT
Hi,

I'm reading in a CSV file, and I would like to write it back as a permanent
table, but with particular partitioning by year, etc.

Currently I do this:

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
df =
sqlContext.read.format('com.databricks.spark.csv').options(header='true',
inferschema='true').load('/Users/imran/Downloads/intermediate.csv')
df.saveAsTable("intermediate")

Which works great.

I also know I can do this:
df.write.partitionBy("year").parquet("path/to/output")

But how do I combine the two, to save a permanent table with partitioning,
in Parquet format?

thanks,
imran

Mime
View raw message