spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lalwani, Jayesh" <jlalw...@amazon.com.INVALID>
Subject Re: Spark Dataset withColumn issue
Date Thu, 12 Nov 2020 18:22:09 GMT
Note that Spark never guarantees ordering of columns. There’s nothing in Spark documentation
that says that the columns will be ordered a certain way. The proposed solution relies on
an implementation detail that might change in future version of Spark.

Ideally, you shouldn’t rely on Dataframe to maintain order of columns. The question is why
do you care about ordering of cols? If order of data is important, then you should put it
in an array

From: Vikas Garg <sperry.it@gmail.com>
Date: Thursday, November 12, 2020 at 12:40 PM
To: Subash Prabakar <subashprabakar@gmail.com>
Cc: German Schiavon <gschiavonspark@gmail.com>, User <user@spark.apache.org>
Subject: RE: [EXTERNAL] Spark Dataset withColumn issue


CAUTION: This email originated from outside of the organization. Do not click links or open
attachments unless you can confirm the sender and know the content is safe.


Ohhkkkk

Thanks a lot

On Thu, Nov 12, 2020, 21:23 Subash Prabakar <subashprabakar@gmail.com<mailto:subashprabakar@gmail.com>>
wrote:
Hi Vikas,

He suggested to use the select() function after your withColumn function.

val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample”)).select(“Col1”,
“Col2”, “Col3")


Thanks,
Subash

On Thu, Nov 12, 2020 at 9:19 PM Vikas Garg <sperry.it@gmail.com<mailto:sperry.it@gmail.com>>
wrote:
I am deriving the col2 using with colunn which is why I cant use it like you told me

On Thu, Nov 12, 2020, 20:11 German Schiavon <gschiavonspark@gmail.com<mailto:gschiavonspark@gmail.com>>
wrote:
ds.select("Col1", "Col2", "Col3")

On Thu, 12 Nov 2020 at 15:28, Vikas Garg <sperry.it@gmail.com<mailto:sperry.it@gmail.com>>
wrote:
In Spark Datase, if we add additional column using
withColumn
then the column is added in the last.

e.g.
val ds1 = ds.select("Col1", "Col3").withColumn("Col2", lit("sample"))

the the order of columns is >> Col1  |  Col3  |  Col2

I want the order to be  >> Col1  |  Col2  |  Col3

How can I achieve this?
Mime
View raw message