So, there might be a shorter path to success, I'd be curious too. What I was able to do isĀ 

1. Create the RDD
2. Apply a schema that is 1 column wider
3. register as table
4. insert new data with 1 extra column

I believe you'd have to do step 2 -- if you're inserting into a schema, and you have extra columns, it would be logical that they get dropped. I believe in a scenario where this is done over time you'd have a step 1a, where you register your table, but once your schema grows, you'd have to register the table again, this time from a schemaRDD that has more columns


On Mon, Dec 22, 2014 at 12:11 AM, Adam Gilmore <dragoncurve@gmail.com> wrote:
Hi all,

I understand that parquet allows for schema versioning automatically in the format; however, I'm not sure whether Spark supports this.

I'm saving a SchemaRDD to a parquet file, registering it as a table, then doing an insertInto with a SchemaRDD with an extra column.

The second SchemaRDD does in fact get inserted, but the extra column isn't present when I try to query it with Spark SQL.

Is there anything I can do to get this working how I'm hoping?