spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyukjin Kwon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-18667) input_file_name function does not work with UDF
Date Thu, 01 Dec 2016 07:01:06 GMT
Hyukjin Kwon created SPARK-18667:
------------------------------------

             Summary: input_file_name function does not work with UDF
                 Key: SPARK-18667
                 URL: https://issues.apache.org/jira/browse/SPARK-18667
             Project: Spark
          Issue Type: Bug
          Components: PySpark
            Reporter: Hyukjin Kwon


{{input_file_name()}} does not return the file name but empty string instead when it is used
as input for UDF in PySpark as below: 

with the data as below:

{code}
{"a": 1}
{code}

with the codes below:

{code}
from pyspark.sql.functions import *
from pyspark.sql.types import *

def filename(path):
    return path

sourceFile = udf(filename, StringType())
spark.read.json("tmp.json").select(sourceFile(input_file_name())).show()
{code}

prints as below:

{code}
+---------------------------+
|filename(input_file_name())|
+---------------------------+
|                           |
+---------------------------+
{code}

but the codes below:

{code}
spark.read.json("tmp.json").select(input_file_name()).show()
{code}

prints correctly as below:

{code}
+--------------------+
|   input_file_name()|
+--------------------+
|file:///Users/hyu...|
+--------------------+
{code}

This seems PySpark specific issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message