spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takeshi Yamamuro <linguin....@gmail.com>
Subject Re: Can Spark Dataframes preserve order when joining?
Date Thu, 30 Jun 2016 18:00:15 GMT
Hi,

Most of join strategies do not preserve the orderings of input dfs
(sort-merge joins
only hold the ordering of a left input df).
So, as said earlier, you need to explicitly sort them if you want ordered
outputs.

// maropu

On Wed, Jun 29, 2016 at 3:38 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Hi,
>
> Well I would not assume anything myself. If you want to order it do it
> explicitly.
>
> Let us take a simple case by creating three DFs based on existing tables
>
> val s =
> HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
> val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>
> now let us join these tables
>
> val rs =
> s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))
>
> And do ab order explicitly
>
> val rs1 = rs.*orderBy*
> ("calendar_month_desc","channel_desc").take(5).foreach(println)
>
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 29 June 2016 at 14:32, Jestin Ma <jestinwith.an.e@gmail.com> wrote:
>
>> If it’s not too much trouble, could I get some pointers/help on this?
>> (see link)
>>
>> http://stackoverflow.com/questions/38085801/can-dataframe-joins-in-spark-preserve-order
>>
>> -also, as a side question, do Dataframes support easy reordering of
>> columns?
>>
>> Thank you!
>> Jestin
>>
>
>


-- 
---
Takeshi Yamamuro

Mime
View raw message