spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Hakobian <nicholas.hakob...@rallyhealth.com>
Subject Re: How to filter based on a constant value
Date Sun, 31 Jul 2016 06:49:27 GMT
>From the online docs:
https://spark.apache.org/docs/1.6.2/api/java/org/apache/spark/sql/Row.html#apply(int)

java.lang.Object apply(int i)
Returns the value at position i. If the value is null, null is returned.
The following is a mapping between Spark SQL types and return types:

So its returning the content of the first element in the row, in this case
the Array (of length 1) of Date types.


Nicholas Szandor Hakobian
Data Scientist
Rally Health
nicholas.hakobian@rallyhealth.com
M: 510-295-7113


On Sat, Jul 30, 2016 at 11:41 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
> wrote:

> thanks gents.
>
> I am trying to understand this better.
>
> As I understand a DataFrame is basically an equivalent table in relational
> term.
>
> so
>
> scala> var maxdate =
> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)).agg(max("transactiondate"))
> maxdate: org.apache.spark.sql.DataFrame = [max(transactiondate): date]
>
> So I find the max(transactiondate) for the filter I have applied.  In sql
> term --> select max(transactiondate) from ll_18740868 where
> transactiondescription like "%HASHTAG%"
>
> Now I want to store it in a single variable and get it worked out
>
> scala> var maxdate =
> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)).agg(max("transactiondate")).collect
> maxdate: Array[org.apache.spark.sql.Row] = Array([2015-12-15])
>
> Now I have the value stored in a row. I get it as follows. It is the
> first column of the row (actually the only column) and in date format
>
> scala> var maxdate =
> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)).agg(max("transactiondate")).collect.
> apply(0).getDate(0)
> maxdate: java.sql.Date = 2015-12-15
>
> what is the role of apply(0) here?
>
> Thanks
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 31 July 2016 at 03:28, Xinh Huynh <xinh.huynh@gmail.com> wrote:
>
>> Hi Mitch,
>>
>> I think you were missing a step:
>> [your result] maxdate: org.apache.spark.sql.Row = [2015-12-15]
>> Since maxdate is of type Row, you would want to extract the first column
>> of the Row with:
>>
>> >> val maxdateStr = maxdate.getString(0)
>>
>> assuming the column type is String.
>> API doc is here:
>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Row
>>
>> Then you can do the query:
>>
>> >> col("transactiondate") === maxdateStr
>>
>> Xinh
>>
>> On Sat, Jul 30, 2016 at 5:20 PM, ayan guha <guha.ayan@gmail.com> wrote:
>>
>>> select *
>>> from (select *,
>>>              rank() over (order by transactiondate) r
>>>        from ll_18740868 where transactiondescription='XYZ'
>>>       ) inner
>>> where r=1
>>>
>>> Hi Mitch,
>>>
>>> If using SQL is fine, you can try the code above. You need to register
>>> ll_18740868  as temp table.
>>>
>>> On Sun, Jul 31, 2016 at 6:49 AM, Mich Talebzadeh <
>>> mich.talebzadeh@gmail.com> wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> I would like to find out when it was the last time I paid a company
>>>> with Debit Card
>>>>
>>>>
>>>> This is the way I do it.
>>>>
>>>> 1) Find the date when I paid last
>>>> 2) Find the rest of details from the row(s)
>>>>
>>>> So
>>>>
>>>> var HASHTAG = "XYZ"
>>>> scala> var maxdate =
>>>> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)).agg(max("transactiondate")).collect.apply(0)
>>>> maxdate: org.apache.spark.sql.Row = [2015-12-15]
>>>>
>>>> OK so it was 2015-12-15
>>>>
>>>>
>>>> Now I want to get the rest of the columns. This one works when I hard
>>>> code the maxdate!
>>>>
>>>>
>>>> scala> ll_18740868.filter(col("transactiondescription").contains(HASHTAG)
>>>> && col("transactiondate") === "2015-12-15").select("transactiondate",
>>>> "transactiondescription", "debitamount").show
>>>> +---------------+----------------------+-----------+
>>>> |transactiondate|transactiondescription|debitamount|
>>>> +---------------+----------------------+-----------+
>>>> |     2015-12-15|  XYZ LTD CD 4636 |      10.95|
>>>> +---------------+----------------------+-----------+
>>>>
>>>> Now if I want to use the var maxdate in place of "2015-12-15", how
>>>> would I do that?
>>>>
>>>> I tried lit(maxdate) etc but they are all giving me error?
>>>>
>>>> java.lang.RuntimeException: Unsupported literal type class
>>>> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
>>>> [2015-12-15]
>>>>
>>>>
>>>> Thanks
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>
>>
>

Mime
View raw message