spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Coolbeth, Matthew" <Matthew.Coolb...@espn.com>
Subject Create table from Avro-generated parquet files?
Date Tue, 07 May 2019 20:18:39 GMT
I have a “directory” in S3 containing Parquet files created from Avro using the AvroParquetWriter
in the parquet-mr project.

I can load the contents of these files as a DataFrame using
    val it = spark.read.parquet("s3a://coolbeth/file=testy")

but I have not found a way to define a permanent table based on these parquet files.

If I do a regular CREATE EXTERNAL TABLE STORED AS PARQUET, then the deserialization crashes
at query time, I think because there is no Spark schema stored in the parquet metadata (there
is instead an Avro schema).

Is there a way to create the table I want from these Avro-generated Parquet files?

Thanks,

Matt Coolbeth
Software Engineer
Disney DTCI

Mime
View raw message