spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: How to use registered Hive UDF in Spark DataFrame?
Date Fri, 02 Oct 2015 14:53:05 GMT
import org.apache.spark.sql.functions.*

callUDF("MyUDF", col("col1"), col("col2"))

On Fri, Oct 2, 2015 at 6:25 AM, unk1102 <umesh.kacha@gmail.com> wrote:

> Hi I have registed my hive UDF using the following code:
>
> hiveContext.udf().register("MyUDF",new UDF1(String,String)) {
> public String call(String o) throws Execption {
> //bla bla
> }
> },DataTypes.String);
>
> Now I want to use above MyUDF in DataFrame. How do we use it? I know how to
> use it in a sql and it works fine
>
> hiveContext.sql(select MyUDF("test") from myTable);
>
> My hiveContext.sql() query involves group by on multiple columns so for
> scaling purpose I am trying to convert this query into DataFrame APIs
>
>
> dataframe.select("col1","col2","coln").groupby(""col1","col2","coln").count();
>
> Can we do the follwing dataframe.select(MyUDF("col1"))??? Please guide.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-registered-Hive-UDF-in-Spark-DataFrame-tp24907.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message