spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: This simple UDF is not working!
Date Fri, 25 Mar 2016 23:04:48 GMT
Hi Ted,

I decided to take a short cut here. I created the map leaving date as it is
(p(1)) below

def CleanupCurrency (word : String) : Double = {
return word.toString.substring(1).replace(",", "").toDouble
}
sqlContext.udf.register("CleanupCurrency", CleanupCurrency(_:String))
val a = df.filter(col("Total") > "").map(p => Invoices(p(0).toString,
p(1).toString, CleanupCurrency(p(2).toString),
CleanupCurrency(p(3).toString), CleanupCurrency(p(4).toString)))

//
// convert this RDD to DF and create a Spark temporary table
//
a.toDF.registerTempTable("tmp")

INSERT INTO TABLE <HIVE_TABLE>
SELECT
          INVOICENUMBER
        ,
TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'dd/MM/yyyy'),'yyyy-MM-dd'))
AS paymentdate
        , NET
        , VAT
        , TOTAL
FROM tmp
"""
sql(sqltext)

That works OK.

If I want to find invoices with paymentdate > 6 months old I do

sql("SELECT invoicenumber, paymentdate FROM test.t14 *WHERE
months_between(FROM_unixtime(unix_timestamp(), 'yyyy-MM-dd'), paymentdate)*
> 6 ORDER BY invoicenumber, paymentdate").collect.foreach(println)
[360,2014-02-10]
[361,2014-02-17]

I still interested if I could do it using a UDF :)





Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 25 March 2016 at 17:44, Ted Yu <yuzhihong@gmail.com> wrote:

> Mich:
> Please take a look at:
> sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala
>
> test("function to_date") {
>
> Remember to:
> import org.apache.spark.sql.functions._
>
> On Fri, Mar 25, 2016 at 7:59 AM, Mich Talebzadeh <
> mich.talebzadeh@gmail.com> wrote:
>
>> This works with sql
>>
>> sqltext = """
>> INSERT INTO TABLE t14
>> SELECT
>>           INVOICENUMBER
>>         ,
>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'dd/MM/yyyy'),'yyyy-MM-dd'))
>> AS paymentdate
>>         , NET
>>         , VAT
>>         , TOTAL
>> FROM tmp
>> """
>> sql(sqltext)
>>
>>
>> but not in UDF.  I want to convert it to correct date format  before
>> writing it to table
>>
>> Thanks
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 25 March 2016 at 14:54, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>>> Do you mind showing body of TO_DATE() ?
>>>
>>> Thanks
>>>
>>> On Fri, Mar 25, 2016 at 7:38 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>>>
>>>> Looks like you forgot an import for Date.
>>>>
>>>> FYI
>>>>
>>>> On Fri, Mar 25, 2016 at 7:36 AM, Mich Talebzadeh <
>>>> mich.talebzadeh@gmail.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> writing a UDF to convert  a string into Date
>>>>>
>>>>> def ChangeDate(word : String) : Date = {
>>>>>      | return
>>>>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(word),"dd/MM/yyyy"),"yyyy-MM-dd")
>>>>>      | }
>>>>> <console>:19: error: not found: type Date
>>>>>
>>>>> That code to_date.. works OK in sql but not here. It is complaining
>>>>> about to_date?
>>>>>
>>>>> Any ideas will be appreciated.
>>>>>
>>>>> Thanks,
>>>>>
>>>>>
>>>>> Dr Mich Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>
>>>>>
>>>>>
>>>>> http://talebzadehmich.wordpress.com
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message