spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Fwd: This simple UDF is not working!
Date Sat, 26 Mar 2016 17:23:10 GMT
Thanks great Dhaval.

scala> import java.text.SimpleDateFormat
import java.text.SimpleDateFormat
scala>
scala> import java.sql.Date
import java.sql.Date
scala>
scala> import scala.util.{Try, Success, Failure}
import scala.util.{Try, Success, Failure}
scala> val toDate = udf{(out:String, form: String) => {
     | val format = new SimpleDateFormat(s"$form");
     |  Try(new Date(format.parse(out.toString()).getTime))match {
     |  case Success(t) => Some(t)
     |   case Failure(_) => None
     |   }}};
toDate: org.apache.spark.sql.UserDefinedFunction =
UserDefinedFunction(<function2>,DateType,List(StringType, StringType))

Register the UDF but comes back with an error. I guess I am not passing
parameters properly?

scala> sqlContext.udf.register("toDate", toDate(_:String, _:String))
<console>:31: error: type mismatch;
 found   : String
 required: org.apache.spark.sql.Column
              sqlContext.udf.register("toDate", toDate(_:String, _:String))
                                                       ^

Now I want to use it here

val a = df.filter(col("Total") > "").map(p => Invoices(p(0).toString,
*p(1).toString,* CleanupCurrency(p(2).toString),
CleanupCurrency(p(3).toString), CleanupCurrency(p(4).toString)))

for p(1)

Does p(1) which is type any need to be converted to String first. like
toDate(p(1).toString) etc?

Thanks again

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 26 March 2016 at 07:00, Dhaval Modi <dhavalmodi24@gmail.com> wrote:

> Import below libs:
>
> import java.text.SimpleDateFormat
> import java.sql.Date
> import scala.util.{Try, Success, Failure}
>
> Regards,
> Dhaval Modi
> dhavalmodi24@gmail.com
>
> On 26 March 2016 at 12:28, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>
>> I am getting this
>>
>> scala> import org.apache.spark.sql.functions._
>> import org.apache.spark.sql.functions._
>> scala> import java.text.SimpleDateFormat;
>> import java.text.SimpleDateFormat
>> scala> import java.text.DateFormat;
>> import java.text.DateFormat
>> scala> import java.text.SimpleDateFormat;
>> import java.text.SimpleDateFormat
>> scala> import java.text.ParseException;
>> import java.text.ParseException
>> scala> import java.util.Date;
>> import java.util.Date
>> scala> val toDate = udf{(out:String, form: String) => {
>>      | val format = new SimpleDateFormat(s"$form");
>>      |  Try(new Date(format.parse(out.toString()).getTime))match {
>>      |  case Success(t) => Some(t)
>>      |   case Failure(_) => None
>>      |   }}};
>> <console>:34: error: not found: value Try
>>         Try(new Date(format.parse(out.toString()).getTime))match {
>>         ^
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 26 March 2016 at 06:44, Dhaval Modi <dhavalmodi24@gmail.com> wrote:
>>
>>> Yes, it is for Scala UDF.
>>>
>>> Scala does use JAVA functions/classes, so this works fine.
>>>
>>> Regards,
>>> Dhaval Modi
>>> dhavalmodi24@gmail.com
>>>
>>> On 26 March 2016 at 12:12, Mich Talebzadeh <mich.talebzadeh@gmail.com>
>>> wrote:
>>>
>>>> Thanks very much Dhaval.
>>>>
>>>> Is the code meant to be for Scala UDF?
>>>>
>>>> What I had before was
>>>>
>>>> def ChangeToDate (word : String) : Date = {
>>>> return
>>>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(word,"dd/MM/yyyy"),"yyyy-MM-dd"))
>>>> }
>>>>
>>>> Which did not work
>>>>
>>>> Regards,
>>>>
>>>>
>>>> Dr Mich Talebzadeh
>>>>
>>>>
>>>>
>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>
>>>>
>>>>
>>>> http://talebzadehmich.wordpress.com
>>>>
>>>>
>>>>
>>>> On 26 March 2016 at 06:07, Dhaval Modi <dhavalmodi24@gmail.com> wrote:
>>>>
>>>>> Hi Mich,
>>>>>
>>>>> You can try this:
>>>>>
>>>>> val toDate = udf{(out:String, form: String) => {
>>>>> val format = new SimpleDateFormat(s"$form");
>>>>> Try(new Date(format.parse(out.toString()).getTime))match {
>>>>> case Success(t) => Some(t)
>>>>> case Failure(_) => None
>>>>> }}};
>>>>>
>>>>> Usage: src = src.withColumn(s"$columnName",
>>>>> toDate(src(s"$columnName"), lit(s"dd/MM/yyyy")))
>>>>>
>>>>>
>>>>> Regards,
>>>>> Dhaval Modi
>>>>> dhavalmodi24@gmail.com
>>>>>
>>>>> On 26 March 2016 at 04:34, Mich Talebzadeh <mich.talebzadeh@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Ted,
>>>>>>
>>>>>> I decided to take a short cut here. I created the map leaving date
as
>>>>>> it is (p(1)) below
>>>>>>
>>>>>> def CleanupCurrency (word : String) : Double = {
>>>>>> return word.toString.substring(1).replace(",", "").toDouble
>>>>>> }
>>>>>> sqlContext.udf.register("CleanupCurrency", CleanupCurrency(_:String))
>>>>>> val a = df.filter(col("Total") > "").map(p => Invoices(p(0).toString,
>>>>>> p(1).toString, CleanupCurrency(p(2).toString),
>>>>>> CleanupCurrency(p(3).toString), CleanupCurrency(p(4).toString)))
>>>>>>
>>>>>> //
>>>>>> // convert this RDD to DF and create a Spark temporary table
>>>>>> //
>>>>>> a.toDF.registerTempTable("tmp")
>>>>>>
>>>>>> INSERT INTO TABLE <HIVE_TABLE>
>>>>>> SELECT
>>>>>>           INVOICENUMBER
>>>>>>         ,
>>>>>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'dd/MM/yyyy'),'yyyy-MM-dd'))
>>>>>> AS paymentdate
>>>>>>         , NET
>>>>>>         , VAT
>>>>>>         , TOTAL
>>>>>> FROM tmp
>>>>>> """
>>>>>> sql(sqltext)
>>>>>>
>>>>>> That works OK.
>>>>>>
>>>>>> If I want to find invoices with paymentdate > 6 months old I do
>>>>>>
>>>>>> sql("SELECT invoicenumber, paymentdate FROM test.t14 *WHERE
>>>>>> months_between(FROM_unixtime(unix_timestamp(), 'yyyy-MM-dd'), paymentdate)*
>>>>>> > 6 ORDER BY invoicenumber, paymentdate").collect.foreach(println)
>>>>>> [360,2014-02-10]
>>>>>> [361,2014-02-17]
>>>>>>
>>>>>> I still interested if I could do it using a UDF :)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Dr Mich Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://talebzadehmich.wordpress.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 25 March 2016 at 17:44, Ted Yu <yuzhihong@gmail.com> wrote:
>>>>>>
>>>>>>> Mich:
>>>>>>> Please take a look at:
>>>>>>> sql/core/src/test/scala/org/apache/spark/sql/DateFunctionsSuite.scala
>>>>>>>
>>>>>>> test("function to_date") {
>>>>>>>
>>>>>>> Remember to:
>>>>>>> import org.apache.spark.sql.functions._
>>>>>>>
>>>>>>> On Fri, Mar 25, 2016 at 7:59 AM, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>
>>>>>>>> This works with sql
>>>>>>>>
>>>>>>>> sqltext = """
>>>>>>>> INSERT INTO TABLE t14
>>>>>>>> SELECT
>>>>>>>>           INVOICENUMBER
>>>>>>>>         ,
>>>>>>>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(paymentdate,'dd/MM/yyyy'),'yyyy-MM-dd'))
>>>>>>>> AS paymentdate
>>>>>>>>         , NET
>>>>>>>>         , VAT
>>>>>>>>         , TOTAL
>>>>>>>> FROM tmp
>>>>>>>> """
>>>>>>>> sql(sqltext)
>>>>>>>>
>>>>>>>>
>>>>>>>> but not in UDF.  I want to convert it to correct date format
>>>>>>>> before writing it to table
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 25 March 2016 at 14:54, Ted Yu <yuzhihong@gmail.com>
wrote:
>>>>>>>>
>>>>>>>>> Do you mind showing body of TO_DATE() ?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> On Fri, Mar 25, 2016 at 7:38 AM, Ted Yu <yuzhihong@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Looks like you forgot an import for Date.
>>>>>>>>>>
>>>>>>>>>> FYI
>>>>>>>>>>
>>>>>>>>>> On Fri, Mar 25, 2016 at 7:36 AM, Mich Talebzadeh
<
>>>>>>>>>> mich.talebzadeh@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> writing a UDF to convert  a string into Date
>>>>>>>>>>>
>>>>>>>>>>> def ChangeDate(word : String) : Date = {
>>>>>>>>>>>      | return
>>>>>>>>>>> TO_DATE(FROM_UNIXTIME(UNIX_TIMESTAMP(word),"dd/MM/yyyy"),"yyyy-MM-dd")
>>>>>>>>>>>      | }
>>>>>>>>>>> <console>:19: error: not found: type Date
>>>>>>>>>>>
>>>>>>>>>>> That code to_date.. works OK in sql but not here.
It is
>>>>>>>>>>> complaining about to_date?
>>>>>>>>>>>
>>>>>>>>>>> Any ideas will be appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Dr Mich Talebzadeh
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>>>>>>>>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> http://talebzadehmich.wordpress.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message