spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: How to filter based on a constant value
Date Sun, 31 Jul 2016 09:21:44 GMT
Hi

This is because Spark does  not provide a way to "bind" variables like
Oracle does.

So you can build the sql string, like below (in python)

val = 'XYZ'
sqlbase = "select ..... where col = '<val>'".replace('<val>,val)



On Sun, Jul 31, 2016 at 6:25 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com>
wrote:

> Thanks Ayan.
>
> This is the one I used
>
> scala> sqltext = """
>      |  select *
>      | from (select transactiondate, transactiondescription, debitamount
>      | , rank() over (order by transactiondate desc) r
>      | from ll_18740868 where transactiondescription like '%XYZ%'
>      |       ) inner
>      |  where r=1
>      |    """
>
> scala> HiveContext.sql(sqltext).show
> +---------------+----------------------+-----------+---+
> |transactiondate|transactiondescription|debitamount|  r|
> +---------------+----------------------+-----------+---+
> |     2015-12-15|  XYZ LTD CD 4636 |      10.95|  1|
> +---------------+----------------------+-----------+---+
>
> The issue I see is that in SQL here I cannot pass HASHTAG as a variable to
> SQL. For example in RDBMS I can do this
>
> 1> declare @pattern varchar(50)
> 2> set @pattern = 'Direct'
> 3> select CHANNEL_DESC from CHANNELS where CHANNEL_DESC like
> '%'||@pattern||'%'
> 4> go
> (1 row affected)
>  CHANNEL_DESC
>  --------------------
>  Direct Sales
>
> but not in Hive or Spark SQL
>
> whereas with FP it does it implicitly.
>
> col("CHANNELS").contains(HASHTAG))
>
> Unless there is a way of doing it?
>
> Thanks
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 31 July 2016 at 01:20, ayan guha <guha.ayan@gmail.com> wrote:
>
>> select *
>> from (select *,
>>              rank() over (order by transactiondate) r
>>        from ll_18740868 where transactiondescription='XYZ'
>>       ) inner
>> where r=1
>>
>> Hi Mitch,
>>
>> If using SQL is fine, you can try the code above. You need to register
>> ll_18740868  as temp table.
>>
>> On Sun, Jul 31, 2016 at 6:49 AM, Mich Talebzadeh <
>> mich.talebzadeh@gmail.com> wrote:
>>
>>>
>>> Hi,
>>>
>>> I would like to find out when it was the last time I paid a company with
>>> Debit Card
>>>
>>>
>>> This is the way I do it.
>>>
>>> 1) Find the date when I paid last
>>> 2) Find the rest of details from the row(s)
>>>
>>> So
>>>
>>> var HASHTAG = "XYZ"
>>> scala> var maxdate =
>>> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)).agg(max("transactiondate")).collect.apply(0)
>>> maxdate: org.apache.spark.sql.Row = [2015-12-15]
>>>
>>> OK so it was 2015-12-15
>>>
>>>
>>> Now I want to get the rest of the columns. This one works when I hard
>>> code the maxdate!
>>>
>>>
>>> scala> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)
>>> && col("transactiondate") === "2015-12-15").select("transactiondate",
>>> "transactiondescription", "debitamount").show
>>> +---------------+----------------------+-----------+
>>> |transactiondate|transactiondescription|debitamount|
>>> +---------------+----------------------+-----------+
>>> |     2015-12-15|  XYZ LTD CD 4636 |      10.95|
>>> +---------------+----------------------+-----------+
>>>
>>> Now if I want to use the var maxdate in place of "2015-12-15", how would
>>> I do that?
>>>
>>> I tried lit(maxdate) etc but they are all giving me error?
>>>
>>> java.lang.RuntimeException: Unsupported literal type class
>>> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
>>> [2015-12-15]
>>>
>>>
>>> Thanks
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


-- 
Best Regards,
Ayan Guha

Mime
View raw message