spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Can Spark Dataframes preserve order when joining?
Date Wed, 29 Jun 2016 22:38:00 GMT
Hi,

Well I would not assume anything myself. If you want to order it do it
explicitly.

Let us take a simple case by creating three DFs based on existing tables

val s =
HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")

now let us join these tables

val rs =
s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))

And do ab order explicitly

val rs1 = rs.*orderBy*
("calendar_month_desc","channel_desc").take(5).foreach(println)


HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 29 June 2016 at 14:32, Jestin Ma <jestinwith.an.e@gmail.com> wrote:

> If it’s not too much trouble, could I get some pointers/help on this? (see
> link)
>
> http://stackoverflow.com/questions/38085801/can-dataframe-joins-in-spark-preserve-order
>
> -also, as a side question, do Dataframes support easy reordering of
> columns?
>
> Thank you!
> Jestin
>

Mime
View raw message