spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Xin <>
Subject Re: need help to have a Java version of this scala script
Date Sat, 17 Dec 2016 10:33:49 GMT
thanks for pointing to the right direction, I have figured out the way.

    On Saturday, December 17, 2016 5:23 PM, Igor Berman <> wrote:

 do you mind to show what you have in java?in general $"bla" is col("bla") as soon as you
import appropriate functionimport static org.apache.spark.sql.functions.callUDF;import static
udf should be callUDF e.g.ds.withColumn("localMonth", callUDF("toLocalMonth", col("unixTs"),
On 17 December 2016 at 09:54, Richard Xin <> wrote:

what I am trying to do:I need to add column (could be complicated transformation based on
value of a column) to a give dataframe.
scala script:val hContext = new HiveContext(sc)
import hContext.implicits._
val df = hContext.sql("select x,y,cluster_no from test.dc")
val len = udf((str: String) => str.length)
val twice = udf { (x: Int) => println(s"Computed: twice($x)"); x * 2 }
val triple = udf { (x: Int) => println(s"Computed: triple($x)"); x * 3}
val df1 = df.withColumn("name-len", len($"x"))
val df2 = df1.withColumn("twice", twice($"cluster_no"))
val df3 = df2.withColumn("triple", triple($"cluster_no"))
The scala script above seems to work ok, but I am having trouble to do it Java way (note that
transformation based on value of a column could be complicated, not limited to simple add/minus
etc.). is there a way in java? Thanks.

View raw message