spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manivannan Selvadurai <>
Subject Best way to store Avro Objects as Parquet using SPARK
Date Mon, 21 Mar 2016 05:55:03 GMT
Hi All,

          In my current project there is a requirement to store avro data
(json format) as parquet files.
I was able to use AvroParquetWriter in separately to create the Parquet
Files. The parquet files along with the data also had the 'avro schema'
stored on them as a part of their footer.

           But when tired using Spark streamng I could not find a way to
store the data with the avro schema information. The closest that I got was
to create a Dataframe using the json RDDs and store them as parquet. Here
the parquet files had a spark specific schema in their footer.

      Is this the right approach or do I have a better one. Please guide me.

We are using Spark 1.4.1.

Thanks In Advance!!

View raw message