spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Best Practice: Evaluate Expression from Spark DataFrame Column
Date Sat, 28 Mar 2020 02:35:51 GMT
Hi Spark Users,

I want to evaluate expression from dataframe column values on other columns
in the same dataframe for each row. Please suggest best approach to deal
with this given that not impacting the performance of the job.

Thanks

Sample code:

val sampleDF = Seq(
  (8, 1, "bat", "NUM IS NOT NULL AND FLAG IS NOT 0"),
  (64, 0, "mouse", "NUM IS NOT NULL AND FLAG IS NOT 0"),
  (-27, 1, "horse" , "NUM IS NOT NULL AND FLAG IS NOT 0"),
  (null, 0, "miki", "NUM IS NOT NULL AND FLAG IS NOT 1 AND WORD IS 'MIKI'")
).toDF("num", "flag", "word", "expression")

val derivedDF = sampleDF.withColumn("status", sampleDF.col("expression"))

Mime
View raw message