spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: How to filter based on a constant value
Date Sun, 31 Jul 2016 08:25:12 GMT
Thanks Ayan.

This is the one I used

scala> sqltext = """
     |  select *
     | from (select transactiondate, transactiondescription, debitamount
     | , rank() over (order by transactiondate desc) r
     | from ll_18740868 where transactiondescription like '%XYZ%'
     |       ) inner
     |  where r=1
     |    """

scala> HiveContext.sql(sqltext).show
+---------------+----------------------+-----------+---+
|transactiondate|transactiondescription|debitamount|  r|
+---------------+----------------------+-----------+---+
|     2015-12-15|  XYZ LTD CD 4636 |      10.95|  1|
+---------------+----------------------+-----------+---+

The issue I see is that in SQL here I cannot pass HASHTAG as a variable to
SQL. For example in RDBMS I can do this

1> declare @pattern varchar(50)
2> set @pattern = 'Direct'
3> select CHANNEL_DESC from CHANNELS where CHANNEL_DESC like
'%'||@pattern||'%'
4> go
(1 row affected)
 CHANNEL_DESC
 --------------------
 Direct Sales

but not in Hive or Spark SQL

whereas with FP it does it implicitly.

col("CHANNELS").contains(HASHTAG))

Unless there is a way of doing it?

Thanks














Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 31 July 2016 at 01:20, ayan guha <guha.ayan@gmail.com> wrote:

> select *
> from (select *,
>              rank() over (order by transactiondate) r
>        from ll_18740868 where transactiondescription='XYZ'
>       ) inner
> where r=1
>
> Hi Mitch,
>
> If using SQL is fine, you can try the code above. You need to register
> ll_18740868  as temp table.
>
> On Sun, Jul 31, 2016 at 6:49 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>>
>> Hi,
>>
>> I would like to find out when it was the last time I paid a company with
>> Debit Card
>>
>>
>> This is the way I do it.
>>
>> 1) Find the date when I paid last
>> 2) Find the rest of details from the row(s)
>>
>> So
>>
>> var HASHTAG = "XYZ"
>> scala> var maxdate =
>> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)).agg(max("transactiondate")).collect.apply(0)
>> maxdate: org.apache.spark.sql.Row = [2015-12-15]
>>
>> OK so it was 2015-12-15
>>
>>
>> Now I want to get the rest of the columns. This one works when I hard
>> code the maxdate!
>>
>>
>> scala> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)
>> && col("transactiondate") === "2015-12-15").select("transactiondate",
>> "transactiondescription", "debitamount").show
>> +---------------+----------------------+-----------+
>> |transactiondate|transactiondescription|debitamount|
>> +---------------+----------------------+-----------+
>> |     2015-12-15|  XYZ LTD CD 4636 |      10.95|
>> +---------------+----------------------+-----------+
>>
>> Now if I want to use the var maxdate in place of "2015-12-15", how would
>> I do that?
>>
>> I tried lit(maxdate) etc but they are all giving me error?
>>
>> java.lang.RuntimeException: Unsupported literal type class
>> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
>> [2015-12-15]
>>
>>
>> Thanks
>>
>
>
>
> --
> Best Regards,
> Ayan Guha
>

Mime
View raw message