spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Ash <and...@andrewash.com>
Subject Re: Parquet schema migrations
Date Sun, 05 Oct 2014 22:58:54 GMT
Hi Cody,

I wasn't aware there were different versions of the parquet format.  What's
the difference between "raw parquet" and the Hive-written parquet files?

As for your migration question, the approaches I've often seen are
convert-on-read and convert-all-at-once.  Apache Cassandra for example does
both -- when upgrading between Cassandra versions that change the on-disk
sstable format, it will do a convert-on-read as you access the sstables, or
you can run the upgradesstables command to convert them all at once
post-upgrade.

Andrew

On Fri, Oct 3, 2014 at 4:33 PM, Cody Koeninger <cody@koeninger.org> wrote:

> Wondering if anyone has thoughts on a path forward for parquet schema
> migrations, especially for people (like us) that are using raw parquet
> files rather than Hive.
>
> So far we've gotten away with reading old files, converting, and writing to
> new directories, but that obviously becomes problematic above a certain
> data size.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message