hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hardik Trivedi (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-20276) Hive UDF class getting Instantiated for each call of function
Date Wed, 01 Aug 2018 05:42:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-20276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564763#comment-16564763
] 

Hardik Trivedi edited comment on HIVE-20276 at 8/1/18 5:41 AM:
---------------------------------------------------------------

[~xuefuz] many thanks for your quick respond. 

*I can provide overview of what i implement as bellow.*

*we have create one class which extends USD class as bellow*

{color:#59afe1}public class Strip extends UDF \{    private static int count = 0;    
public Strip()\{       System.out.println(count);       count++;     }
 
      public String evaluate(String str) \{        ...............         ......... 
return "..." ;    }
 }{color} 

*Also create and Register Temporary function as bellow*
 {color:#59afe1}sparkSession.sql("CREATE TEMPORARY FUNCTION STRIP AS 'org.packageName.Strip'");
  {color}

*And use this function in query while creating table*
  
 {color:#59afe1}create external table XYZ location ABC 
 AS select STRIP(firstName) AS FirstName,STRIP(lastName) AS LastName,age AS AGE,STRIP(address)
AS Address from PQRTable;{color}
  
 Now if i run my code it is working fine it create new table XYZ on particular location, but
if you note that i have create one constrictor in Strip class which increase counter and
print it.
 So if PQRTable have 10 rows with 40 records then my temp function is called 30 times that
is fine but each time it create new instance of Strip class. On console it will print 30.
 what if PQRTable have 40,000,000 records then it will create 30,000,000 instance which is
vulnerable thing.

*{color:#654982}I want that, this Strip class should work as singleton class, Because this
class extends UDF class, I can not make it externally singleton it will throw accessRestricted
exception and i can not use UDF functionality.{color}*
  


was (Author: hardik1808):
*I can provide overview of what i implement as bellow.*

*we have create one class which extends USD class as bellow*

{color:#59afe1}public class Strip extends UDF {
 
    private static int count = 0;
 
     public Strip()\{       System.out.println(count);       count++;     }
 
      public String evaluate(String str) \{        ...............         ......... 
return "..." ;    }
 }{color} 

*Also create and Register Temporary function as bellow*
 {color:#59afe1}sparkSession.sql("CREATE TEMPORARY FUNCTION STRIP AS 'org.packageName.Strip'");
  {color}

*And use this function in query while creating table*
  
 {color:#59afe1}create external table XYZ location ABC 
 AS select STRIP(firstName) AS FirstName,STRIP(lastName) AS LastName,age AS AGE,STRIP(address)
AS Address from PQRTable;{color}
  
 Now if i run my code it is working fine it create new table XYZ on particular location, but
if you note that i have create one constrictor in Strip class which increase counter and
print it.
 So if PQRTable have 10 rows with 40 records then my temp function is called 30 times that
is fine but each time it create new instance of Strip class. On console it will print 30.
 what if PQRTable have 40,000,000 records then it will create 30,000,000 instance which is
vulnerable thing.

*{color:#654982}I want that, this Strip class should work as singleton class, Because this
class extends UDF class, I can not make it externally singleton it will throw accessRestricted
exception and i can not use UDF functionality.{color}*
  

> Hive UDF class getting Instantiated for each call of function
> -------------------------------------------------------------
>
>                 Key: HIVE-20276
>                 URL: https://issues.apache.org/jira/browse/HIVE-20276
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.1, 2.1.1
>            Reporter: Hardik Trivedi
>            Priority: Blocker
>
> * I have created One Hive UDF class and register its function in spark.
>  * In hive query inside spark session object  i call this function
>  * Now when i run my code i observe on each time when function called it create new instance
of UDF class.
>  * Is it normal behavior? On each call should it create new instance?
>  * Is it version specific issue? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message