spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Spark SQL - udf with entire row as parameter
Date Fri, 04 Mar 2016 21:04:23 GMT
You have to use SQL to call it (but you will be able to do it with
dataframes in Spark 2.0 due to a better parser).  You need to construct a
struct(*) and then pass that to your function since a function must have a
fixed number of arguments.

Here is an example
<https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/2457334174245122/2840265927289860/b29d1ad2aa.html>

On Fri, Mar 4, 2016 at 6:41 AM, Nisrina Luthfiyati <
nisrina.luthfiyati@gmail.com> wrote:

> Hi all,
> I'm using spark sql in python and want to write a udf that takes an entire
> Row as the argument.
> I tried something like:
>
> def functionName(row):
>     ...
>     return a_string
>
> udfFunctionName=udf(functionName, StringType())
> df.withColumn('columnName', udfFunctionName('*'))
>
> but this gives an error message:
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File
> "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/dataframe.py",
> line 1311, in withColumn
>     return DataFrame(self._jdf.withColumn(colName, col._jc), self.sql_ctx)
>   File
> "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py",
> line 813, in __call__
>   File
> "/home/nina/Downloads/spark-1.6.0-bin-hadoop2.6/python/pyspark/sql/utils.py",
> line 51, in deco
>     raise AnalysisException(s.split(': ', 1)[1], stackTrace)
> pyspark.sql.utils.AnalysisException: u"unresolved operator 'Project
> [address#0,name#1,PythonUDF#functionName(*) AS columnName#26];"
>
> Does anyone know how this can be done or whether this is possible?
>
> Thank you,
> Nisrina.
>
>

Mime
View raw message