spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yana Kadiyska <yana.kadiy...@gmail.com>
Subject Re: Parquet schema changes
Date Mon, 22 Dec 2014 19:21:36 GMT
So, there might be a shorter path to success, I'd be curious too. What I
was able to do is

1. Create the RDD
2. Apply a schema that is 1 column wider
3. register as table
4. insert new data with 1 extra column

I believe you'd have to do step 2 -- if you're inserting into a schema, and
you have extra columns, it would be logical that they get dropped. I
believe in a scenario where this is done over time you'd have a step 1a,
where you register your table, but once your schema grows, you'd have to
register the table again, this time from a schemaRDD that has more columns


On Mon, Dec 22, 2014 at 12:11 AM, Adam Gilmore <dragoncurve@gmail.com>
wrote:

> Hi all,
>
> I understand that parquet allows for schema versioning automatically in
> the format; however, I'm not sure whether Spark supports this.
>
> I'm saving a SchemaRDD to a parquet file, registering it as a table, then
> doing an insertInto with a SchemaRDD with an extra column.
>
> The second SchemaRDD does in fact get inserted, but the extra column isn't
> present when I try to query it with Spark SQL.
>
> Is there anything I can do to get this working how I'm hoping?
>

Mime
View raw message