spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <ja...@japila.pl>
Subject Re: The equivalent for INSTR in Spark FP
Date Tue, 02 Aug 2016 09:57:55 GMT
Congrats! You made it. A serious Spark dev badge unlocked :)

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Tue, Aug 2, 2016 at 9:58 AM, Mich Talebzadeh
<mich.talebzadeh@gmail.com> wrote:
> it should be lit(0) :)
>
> rs.select(mySubstr($"transactiondescription", lit(0),
> instr($"transactiondescription", "CD"))).show(1)
> +--------------------------------------------------------------+
> |UDF(transactiondescription,0,instr(transactiondescription,CD))|
> +--------------------------------------------------------------+
> |                                          OVERSEAS TRANSACTI C|
> +--------------------------------------------------------------+
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> Disclaimer: Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed. The
> author will in no case be liable for any monetary damages arising from such
> loss, damage or destruction.
>
>
>
>
> On 2 August 2016 at 08:52, Mich Talebzadeh <mich.talebzadeh@gmail.com>
> wrote:
>>
>> No thinking on my part!!!
>>
>> rs.select(mySubstr($"transactiondescription", lit(1),
>> instr($"transactiondescription", "CD"))).show(2)
>> +--------------------------------------------------------------+
>> |UDF(transactiondescription,1,instr(transactiondescription,CD))|
>> +--------------------------------------------------------------+
>> |                                           VERSEAS TRANSACTI C|
>> |                                           XYZ.COM 80...|
>> +--------------------------------------------------------------+
>> only showing top 2 rows
>>
>> Let me test it.
>>
>> Cheers
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>> Disclaimer: Use it at your own risk. Any and all responsibility for any
>> loss, damage or destruction of data or any other property which may arise
>> from relying on this email's technical content is explicitly disclaimed. The
>> author will in no case be liable for any monetary damages arising from such
>> loss, damage or destruction.
>>
>>
>>
>>
>> On 1 August 2016 at 23:43, Mich Talebzadeh <mich.talebzadeh@gmail.com>
>> wrote:
>>>
>>> Thanks Jacek.
>>>
>>> It sounds like the issue the position of the second variable in
>>> substring()
>>>
>>> This works
>>>
>>> scala> val wSpec2 =
>>> Window.partitionBy(substring($"transactiondescription",1,20))
>>> wSpec2: org.apache.spark.sql.expressions.WindowSpec =
>>> org.apache.spark.sql.expressions.WindowSpec@1a4eae2
>>>
>>> Using udf as suggested
>>>
>>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
>>>      |  s.substring(start, end) }
>>> mySubstr: org.apache.spark.sql.UserDefinedFunction =
>>> UserDefinedFunction(<function3>,StringType,List(StringType, IntegerType,
>>> IntegerType))
>>>
>>>
>>> This was throwing error:
>>>
>>> val wSpec2 =
>>> Window.partitionBy(substring("transactiondescription",1,indexOf("transactiondescription",'CD')-2))
>>>
>>>
>>> So I tried using udf
>>>
>>> scala> val wSpec2 =
>>> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
>>> instr('s, "CD")))
>>>      | )
>>> <console>:28: error: value select is not a member of
>>> org.apache.spark.sql.ColumnName
>>>          val wSpec2 =
>>> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1),
>>> instr('s, "CD")))
>>>
>>> Obviously I am not doing correctly :(
>>>
>>> cheers
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn
>>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> Disclaimer: Use it at your own risk. Any and all responsibility for any
>>> loss, damage or destruction of data or any other property which may arise
>>> from relying on this email's technical content is explicitly disclaimed. The
>>> author will in no case be liable for any monetary damages arising from such
>>> loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On 1 August 2016 at 23:02, Jacek Laskowski <jacek@japila.pl> wrote:
>>>>
>>>> Hi,
>>>>
>>>> Interesting...
>>>>
>>>> I'm temping to think that substring function should accept the columns
>>>> that hold the numbers for start and end. I'd love hearing people's
>>>> thought on this.
>>>>
>>>> For now, I'd say you need to define udf to do substring as follows:
>>>>
>>>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) =>
>>>> s.substring(start, end) }
>>>> mySubstr: org.apache.spark.sql.expressions.UserDefinedFunction =
>>>> UserDefinedFunction(<function3>,StringType,Some(List(StringType,
>>>> IntegerType, IntegerType)))
>>>>
>>>> scala> df.show
>>>> +-----------+
>>>> |          s|
>>>> +-----------+
>>>> |hello world|
>>>> +-----------+
>>>>
>>>> scala> df.select(mySubstr('s, lit(1), instr('s, "ll"))).show
>>>> +-----------------------+
>>>> |UDF(s, 1, instr(s, ll))|
>>>> +-----------------------+
>>>> |                     el|
>>>> +-----------------------+
>>>>
>>>> Pozdrawiam,
>>>> Jacek Laskowski
>>>> ----
>>>> https://medium.com/@jaceklaskowski/
>>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>>> Follow me at https://twitter.com/jaceklaskowski
>>>>
>>>>
>>>> On Mon, Aug 1, 2016 at 11:18 PM, Mich Talebzadeh
>>>> <mich.talebzadeh@gmail.com> wrote:
>>>> > Thanks Jacek,
>>>> >
>>>> > Do I have any other way of writing this with functional programming?
>>>> >
>>>> > select
>>>> >
>>>> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>>>> >
>>>> >
>>>> > Cheers,
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > Dr Mich Talebzadeh
>>>> >
>>>> >
>>>> >
>>>> > LinkedIn
>>>> >
>>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> >
>>>> >
>>>> >
>>>> > http://talebzadehmich.wordpress.com
>>>> >
>>>> >
>>>> > Disclaimer: Use it at your own risk. Any and all responsibility for
>>>> > any
>>>> > loss, damage or destruction of data or any other property which may
>>>> > arise
>>>> > from relying on this email's technical content is explicitly
>>>> > disclaimed. The
>>>> > author will in no case be liable for any monetary damages arising from
>>>> > such
>>>> > loss, damage or destruction.
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On 1 August 2016 at 22:13, Jacek Laskowski <jacek@japila.pl> wrote:
>>>> >>
>>>> >> Hi Mich,
>>>> >>
>>>> >> There's no indexOf UDF -
>>>> >>
>>>> >>
>>>> >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$
>>>> >>
>>>> >>
>>>> >> Pozdrawiam,
>>>> >> Jacek Laskowski
>>>> >> ----
>>>> >> https://medium.com/@jaceklaskowski/
>>>> >> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
>>>> >> Follow me at https://twitter.com/jaceklaskowski
>>>> >>
>>>> >>
>>>> >> On Mon, Aug 1, 2016 at 7:24 PM, Mich Talebzadeh
>>>> >> <mich.talebzadeh@gmail.com> wrote:
>>>> >> > Hi,
>>>> >> >
>>>> >> > What is the equivalent of FP for the following window/analytic
that
>>>> >> > works OK
>>>> >> > in Spark SQL
>>>> >> >
>>>> >> > This one using INSTR
>>>> >> >
>>>> >> > select
>>>> >> >
>>>> >> >
>>>> >> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>>>> >> >
>>>> >> >
>>>> >> > select distinct *
>>>> >> > from (
>>>> >> >       select
>>>> >> >
>>>> >> >
>>>> >> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2),
>>>> >> >       SUM(debitamount) OVER (PARTITION BY
>>>> >> >
>>>> >> >
>>>> >> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2))
AS
>>>> >> > spent
>>>> >> >       from accounts.ll_18740868 where transactiontype = 'DEB'
>>>> >> >      ) tmp
>>>> >> >
>>>> >> >
>>>> >> > I tried indexOf but it does not work!
>>>> >> >
>>>> >> > val wSpec2 =
>>>> >> >
>>>> >> >
>>>> >> > Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
>>>> >> > <console>:26: error: not found: value indexOf
>>>> >> >          val wSpec2 =
>>>> >> >
>>>> >> >
>>>> >> > Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD")))
>>>> >> >
>>>> >> >
>>>> >> > Thanks
>>>> >> >
>>>> >> > Dr Mich Talebzadeh
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > LinkedIn
>>>> >> >
>>>> >> >
>>>> >> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > http://talebzadehmich.wordpress.com
>>>> >> >
>>>> >> >
>>>> >> > Disclaimer: Use it at your own risk. Any and all responsibility
for
>>>> >> > any
>>>> >> > loss, damage or destruction of data or any other property which
may
>>>> >> > arise
>>>> >> > from relying on this email's technical content is explicitly
>>>> >> > disclaimed.
>>>> >> > The
>>>> >> > author will in no case be liable for any monetary damages arising
>>>> >> > from
>>>> >> > such
>>>> >> > loss, damage or destruction.
>>>> >> >
>>>> >> >
>>>> >
>>>> >
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message