spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mendelson, Assaf" <>
Subject RE: Nested UDFs
Date Thu, 17 Nov 2016 07:42:15 GMT
Regexp_replace is supposed to receive a column, you don’t need to write a UDF for it.
Instead try:, ‘a’, ‘X’)

You would need a Udf if you would wanted to do something on the string value of a single row
(e.g. return data + “bla”)


From: Perttu Ranta-aho []
Sent: Thursday, November 17, 2016 9:15 AM
Subject: Nested UDFs


Shouldn't this work?

from pyspark.sql.functions import regexp_replace, udf

def my_f(data):
    return regexp_replace(data, 'a', 'X')
my_udf = udf(my_f)

test_data = sqlContext.createDataFrame([('a',), ('b',), ('c',)], ('name',))<>)).show()

But instead of 'a' being replaced with 'X' I get exception:
  File ".../spark-2.0.2-bin-hadoop2.7/python/lib/", line
1471, in regexp_replace
    jc = sc._jvm.functions.regexp_replace(_to_java_column(str), pattern, replacement)
AttributeError: 'NoneType' object has no attribute '_jvm'



View raw message