spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: How to filter based on a constant value
Date Sun, 31 Jul 2016 07:27:54 GMT
thanks Nicholas got it



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 31 July 2016 at 07:49, Nicholas Hakobian <
nicholas.hakobian@rallyhealth.com> wrote:

> From the online docs:
> https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/Row.html#apply(int)
>
> java.lang.Object apply(int i)
> Returns the value at position i. If the value is null, null is returned.
> The following is a mapping between Spark SQL types and return types:
>
> So its returning the content of the first element in the row, in this case
> the Array (of length 1) of Date types.
>
>
> Nicholas Szandor Hakobian
> Data Scientist
> Rally Health
> nicholas.hakobian@rallyhealth.com
> M: 510-295-7113
>
>
> On Sat, Jul 30, 2016 at 11:41 PM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> thanks gents.
>>
>> I am trying to understand this better.
>>
>> As I understand a DataFrame is basically an equivalent table in
>> relational term.
>>
>> so
>>
>> scala> var maxdate =
>> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)).agg(max("transactiondate"))
>> maxdate: org.apache.spark.sql.DataFrame = [max(transactiondate): date]
>>
>> So I find the max(transactiondate) for the filter I have applied.  In sql
>> term --> select max(transactiondate) from ll_18740868 where
>> transactiondescription like "%HASHTAG%"
>>
>> Now I want to store it in a single variable and get it worked out
>>
>> scala> var maxdate =
>> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)).agg(max("transactiondate")).collect
>> maxdate: Array[org.apache.spark.sql.Row] = Array([2015-12-15])
>>
>> Now I have the value stored in a row. I get it as follows. It is the
>> first column of the row (actually the only column) and in date format
>>
>> scala> var maxdate =
>> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)).agg(max("transactiondate")).collect.
>> apply(0).getDate(0)
>> maxdate: java.sql.Date = 2015-12-15
>>
>> what is the role of apply(0) here?
>>
>> Thanks
>>
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>> On 31 July 2016 at 03:28, Xinh Huynh <xinh.huynh@gmail.com> wrote:
>>
>>> Hi Mitch,
>>>
>>> I think you were missing a step:
>>> [your result] maxdate: org.apache.spark.sql.Row = [2015-12-15]
>>> Since maxdate is of type Row, you would want to extract the first column
>>> of the Row with:
>>>
>>> >> val maxdateStr = maxdate.getString(0)
>>>
>>> assuming the column type is String.
>>> API doc is here:
>>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Row
>>>
>>> Then you can do the query:
>>>
>>> >> col("transactiondate") === maxdateStr
>>>
>>> Xinh
>>>
>>> On Sat, Jul 30, 2016 at 5:20 PM, ayan guha <guha.ayan@gmail.com> wrote:
>>>
>>>> select *
>>>> from (select *,
>>>>              rank() over (order by transactiondate) r
>>>>        from ll_18740868 where transactiondescription='XYZ'
>>>>       ) inner
>>>> where r=1
>>>>
>>>> Hi Mitch,
>>>>
>>>> If using SQL is fine, you can try the code above. You need to register
>>>> ll_18740868  as temp table.
>>>>
>>>> On Sun, Jul 31, 2016 at 6:49 AM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I would like to find out when it was the last time I paid a company
>>>>> with Debit Card
>>>>>
>>>>>
>>>>> This is the way I do it.
>>>>>
>>>>> 1) Find the date when I paid last
>>>>> 2) Find the rest of details from the row(s)
>>>>>
>>>>> So
>>>>>
>>>>> var HASHTAG = "XYZ"
>>>>> scala> var maxdate =
>>>>> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)).agg(max("transactiondate")).collect.apply(0)
>>>>> maxdate: org.apache.spark.sql.Row = [2015-12-15]
>>>>>
>>>>> OK so it was 2015-12-15
>>>>>
>>>>>
>>>>> Now I want to get the rest of the columns. This one works when I hard
>>>>> code the maxdate!
>>>>>
>>>>>
>>>>> scala> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)
>>>>> && col("transactiondate") === "2015-12-15").select("transactiondate",
>>>>> "transactiondescription", "debitamount").show
>>>>> +---------------+----------------------+-----------+
>>>>> |transactiondate|transactiondescription|debitamount|
>>>>> +---------------+----------------------+-----------+
>>>>> |     2015-12-15|  XYZ LTD CD 4636 |      10.95|
>>>>> +---------------+----------------------+-----------+
>>>>>
>>>>> Now if I want to use the var maxdate in place of "2015-12-15", how
>>>>> would I do that?
>>>>>
>>>>> I tried lit(maxdate) etc but they are all giving me error?
>>>>>
>>>>> java.lang.RuntimeException: Unsupported literal type class
>>>>> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
>>>>> [2015-12-15]
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Ayan Guha
>>>>
>>>
>>>
>>
>

Mime
View raw message