spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod KC (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-25301) When a view uses an UDF from a non default database, Spark analyser throws AnalysisException
Date Tue, 04 Sep 2018 09:27:00 GMT

     [ https://issues.apache.org/jira/browse/SPARK-25301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod KC updated SPARK-25301:
-----------------------------
    Description: 
When a hive view uses an UDF from a non default database, Spark analyser throws AnalysisException

Steps to simulate this issue
 -----------------------------
 Step 1 : Run following statements in Hive
 --------
 ```
 CREATE TABLE emp AS SELECT 'user' AS name, 'address' as address;
 CREATE DATABASE d100;
 CREATE FUNCTION d100.udf100 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; //
Note: udf100 is created in d100
 CREATE VIEW d100.v100 AS SELECT d100.udf100(name) FROM default.emp; 
 SELECT * FROM d100.v100; // query on view d100.v100 gives correct result
 ```
 Step2 : Run following statements in Spark
 -------------
 1) spark.sql("select * from d100.v100").show
 throws 
 ```
 org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. This function
is neither a registered temporary function nor a permanent function registered in the database
'*default*'
 ```

This is because, while parsing the SQL statement of the View 'select `d100.udf100`(`emp`.`name`)
from `default`.`emp`' , spark parser fails to split database name and udf name and hence Spark
function registry tries to load the UDF 'd100.udf100' from 'default' database.

  was:
When a hive view uses an UDF from a non default database, Spark analyser throws AnalysisException

Steps to simulate this issue
 -----------------------------
 Step 1 : Run following statements in Hive
 --------
```sql
CREATE TABLE emp AS SELECT 'user' AS name, 'address' as address;
CREATE DATABASE d100;
CREATE FUNCTION d100.udf100 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper'; //
Note: udf100 is created in d100
CREATE VIEW d100.v100 AS SELECT d100.udf100(name) FROM default.emp; 
SELECT * FROM d100.v100; // query on view d100.v100 gives correct result
```
Step2 : Run following statements in Spark
 -------------
 1) spark.sql("select * from d100.v100").show
 throws 
 ```
 org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. This function
is neither a registered temporary function nor a permanent function registered in the database
'*default*'
 ```

This is because, while parsing the SQL statement of the View 'select `d100.udf100`(`emp`.`name`)
from `default`.`emp`' , spark parser fails to split database name and udf name and hence Spark
function registry tries to load the UDF 'd100.udf100' from 'default' database.


> When a view uses an UDF from a non default database, Spark analyser throws AnalysisException
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-25301
>                 URL: https://issues.apache.org/jira/browse/SPARK-25301
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Vinod KC
>            Priority: Minor
>
> When a hive view uses an UDF from a non default database, Spark analyser throws AnalysisException
> Steps to simulate this issue
>  -----------------------------
>  Step 1 : Run following statements in Hive
>  --------
>  ```
>  CREATE TABLE emp AS SELECT 'user' AS name, 'address' as address;
>  CREATE DATABASE d100;
>  CREATE FUNCTION d100.udf100 as 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFUpper';
// Note: udf100 is created in d100
>  CREATE VIEW d100.v100 AS SELECT d100.udf100(name) FROM default.emp; 
>  SELECT * FROM d100.v100; // query on view d100.v100 gives correct result
>  ```
>  Step2 : Run following statements in Spark
>  -------------
>  1) spark.sql("select * from d100.v100").show
>  throws 
>  ```
>  org.apache.spark.sql.AnalysisException: Undefined function: '*d100.udf100*'. This function
is neither a registered temporary function nor a permanent function registered in the database
'*default*'
>  ```
> This is because, while parsing the SQL statement of the View 'select `d100.udf100`(`emp`.`name`)
from `default`.`emp`' , spark parser fails to split database name and udf name and hence Spark
function registry tries to load the UDF 'd100.udf100' from 'default' database.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message