spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manoj Samel <manojsamelt...@gmail.com>
Subject Re: Shouldn't the UNION of SchemaRDDs produce SchemaRDD ?
Date Mon, 31 Mar 2014 05:11:06 GMT
Hi Aaron,

unionAll is a workaround ...

* unionAll preserve duplicate v/s union that does not
* SQL union and unionAll result in same output format i.e. another SQL v/s
different RDD types here.
* Understand the existing union contract issue. This may be a class
hierarchy discussion for SchemaRDD, UnionRDD etc. ?

Thanks,




On Sun, Mar 30, 2014 at 11:08 AM, Aaron Davidson <ilikerps@gmail.com> wrote:

> Looks like there is a "unionAll" function on SchemaRDD which will do what
> you want. The contract of RDD#union is unfortunately too general to allow
> it to return a SchemaRDD without downcasting.
>
>
> On Sun, Mar 30, 2014 at 7:56 AM, Manoj Samel <manojsameltech@gmail.com>wrote:
>
>> Hi,
>>
>> I am trying SparkSQL based on the example on doc ...
>>
>> ....
>>
>> val people =
>> sc.textFile("/data/spark/examples/src/main/resources/people.txt").map(_.split(",")).map(p
>> => Person(p(0), p(1).trim.toInt))
>>
>>
>> val olderThanTeans = people.where('age > 19)
>> val youngerThanTeans = people.where('age < 13)
>> val nonTeans = youngerThanTeans.union(olderThanTeans)
>>
>> I can do a orderBy('age) on first two (which are SchemaRDD) but not on
>> third. The nonTeans is a UnionRDD that does not supports orderBy. This
>> seems different than the SQL behavior where results of 2 SQL unions is a
>> SQL itself with same functionality ...
>>
>> Not clear why union of 2 SchemaRDDs does not produces a SchemaRDD ....
>>
>>
>> Thanks,
>>
>>
>>
>

Mime
View raw message