spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: union of SchemaRDDs
Date Sun, 02 Nov 2014 02:48:09 GMT
It does generalize types, but only on the intersection of the columns it seems. There might
be a way to get the union of the columns too using HiveQL. Types generalize up with string
being the "most general".

Matei

> On Nov 1, 2014, at 6:22 PM, Daniel Mahler <dmahler@gmail.com> wrote:
> 
> Thanks Matei. What does unionAll do if the input RDD schemas are not 100% compatible.
Does it take the union of the columns and generalize the types?
> 
> thanks
> Daniel
> 
> On Sat, Nov 1, 2014 at 6:08 PM, Matei Zaharia <matei.zaharia@gmail.com <mailto:matei.zaharia@gmail.com>>
wrote:
> Try unionAll, which is a special method on SchemaRDDs that keeps the schema on the results.
> 
> Matei
> 
> > On Nov 1, 2014, at 3:57 PM, Daniel Mahler <dmahler@gmail.com <mailto:dmahler@gmail.com>>
wrote:
> >
> > I would like to combine 2 parquet tables I have create.
> > I tried:
> >
> >       sc.union(sqx.parquetFile("fileA"), sqx.parquetFile("fileB"))
> >
> > but that just returns RDD[Row].
> > How do I combine them to get a SchemaRDD[Row]?
> >
> > thanks
> > Daniel
> 
> 


Mime
View raw message