hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Pivovarov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-9988) Evaluating UDF before query is run
Date Wed, 08 Feb 2017 05:29:41 GMT

    [ https://issues.apache.org/jira/browse/HIVE-9988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15857435#comment-15857435
] 

Alexander Pivovarov commented on HIVE-9988:
-------------------------------------------

You can assign the expression to variable before query is evaluated and then use the variable
in WHERE
{code}
set dt=from_unixtime(unix_timestamp(),'yyyyMMdd');

select * from A where dt=${hiveconf:dt};
{code}

> Evaluating UDF before query is run
> ----------------------------------
>
>                 Key: HIVE-9988
>                 URL: https://issues.apache.org/jira/browse/HIVE-9988
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ådne Brunborg
>
> When using UDFs on partition column in Hive, all partitions are scanned before the UDF
is resolved. 
> If the UDF could be evaluated before query is run, this would greatly improve performance
in cases like this.
> Example - the table has a partition by datestamp (bigint): 
> The following where clause touches upon all 82 partitions:
> {{WHERE datestamp=cast(from_unixtime(unix_timestamp(),'yyyyMMdd') as bigint)}}
> {{15/03/16 09:21:53 INFO mapred.FileInputFormat: Total input paths to process : 82}}
> …whereas the following only touches the one partition:
> {{WHERE datestamp=20150316}}
> {{15/03/16 09:23:06 INFO input.FileInputFormat: Total input paths to process : 1}}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message