From user-return-60620-apmail-spark-user-archive=spark.apache.org@spark.apache.org Tue Aug 2 09:58:07 2016 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 517B019788 for ; Tue, 2 Aug 2016 09:58:07 +0000 (UTC) Received: (qmail 16145 invoked by uid 500); 2 Aug 2016 09:58:02 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 16011 invoked by uid 500); 2 Aug 2016 09:58:02 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 16001 invoked by uid 99); 2 Aug 2016 09:58:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Aug 2016 09:58:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id D8A09C041B for ; Tue, 2 Aug 2016 09:58:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.8 X-Spam-Level: * X-Spam-Status: No, score=1.8 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id iO9v5PrsbheV for ; Tue, 2 Aug 2016 09:57:58 +0000 (UTC) Received: from alu112.rev.netart.pl (alu112.rev.netart.pl [85.128.177.112]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 0FD185F39E for ; Tue, 2 Aug 2016 09:57:58 +0000 (UTC) Received: from mail-lf0-f47.google.com (unknown [209.85.215.47]) by laskowski.nazwa.pl (Postfix) with ESMTP id 6296C426830 for ; Tue, 2 Aug 2016 11:57:57 +0200 (CEST) Received: by mail-lf0-f47.google.com with SMTP id l69so134307809lfg.1 for ; Tue, 02 Aug 2016 02:57:57 -0700 (PDT) X-Gm-Message-State: AEkoouvQV8d/Hg4YGU122iV5CIyv9g3omEvaYKirlbcaRMtgy/V4jPgRTNTzHbNW/tk1turGnr4DaG0KzBo5zw== X-Received: by 10.25.209.195 with SMTP id i186mr21485678lfg.230.1470131876934; Tue, 02 Aug 2016 02:57:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.25.148.3 with HTTP; Tue, 2 Aug 2016 02:57:55 -0700 (PDT) In-Reply-To: References: From: Jacek Laskowski Date: Tue, 2 Aug 2016 11:57:55 +0200 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: The equivalent for INSTR in Spark FP To: Mich Talebzadeh Cc: "user @spark" Content-Type: text/plain; charset=UTF-8 Congrats! You made it. A serious Spark dev badge unlocked :) Pozdrawiam, Jacek Laskowski ---- https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Tue, Aug 2, 2016 at 9:58 AM, Mich Talebzadeh wrote: > it should be lit(0) :) > > rs.select(mySubstr($"transactiondescription", lit(0), > instr($"transactiondescription", "CD"))).show(1) > +--------------------------------------------------------------+ > |UDF(transactiondescription,0,instr(transactiondescription,CD))| > +--------------------------------------------------------------+ > | OVERSEAS TRANSACTI C| > +--------------------------------------------------------------+ > > > > Dr Mich Talebzadeh > > > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > > > http://talebzadehmich.wordpress.com > > > Disclaimer: Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > > > On 2 August 2016 at 08:52, Mich Talebzadeh > wrote: >> >> No thinking on my part!!! >> >> rs.select(mySubstr($"transactiondescription", lit(1), >> instr($"transactiondescription", "CD"))).show(2) >> +--------------------------------------------------------------+ >> |UDF(transactiondescription,1,instr(transactiondescription,CD))| >> +--------------------------------------------------------------+ >> | VERSEAS TRANSACTI C| >> | XYZ.COM 80...| >> +--------------------------------------------------------------+ >> only showing top 2 rows >> >> Let me test it. >> >> Cheers >> >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arising from such >> loss, damage or destruction. >> >> >> >> >> On 1 August 2016 at 23:43, Mich Talebzadeh >> wrote: >>> >>> Thanks Jacek. >>> >>> It sounds like the issue the position of the second variable in >>> substring() >>> >>> This works >>> >>> scala> val wSpec2 = >>> Window.partitionBy(substring($"transactiondescription",1,20)) >>> wSpec2: org.apache.spark.sql.expressions.WindowSpec = >>> org.apache.spark.sql.expressions.WindowSpec@1a4eae2 >>> >>> Using udf as suggested >>> >>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) => >>> | s.substring(start, end) } >>> mySubstr: org.apache.spark.sql.UserDefinedFunction = >>> UserDefinedFunction(,StringType,List(StringType, IntegerType, >>> IntegerType)) >>> >>> >>> This was throwing error: >>> >>> val wSpec2 = >>> Window.partitionBy(substring("transactiondescription",1,indexOf("transactiondescription",'CD')-2)) >>> >>> >>> So I tried using udf >>> >>> scala> val wSpec2 = >>> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1), >>> instr('s, "CD"))) >>> | ) >>> :28: error: value select is not a member of >>> org.apache.spark.sql.ColumnName >>> val wSpec2 = >>> Window.partitionBy($"transactiondescription".select(mySubstr('s, lit(1), >>> instr('s, "CD"))) >>> >>> Obviously I am not doing correctly :( >>> >>> cheers >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn >>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> Disclaimer: Use it at your own risk. Any and all responsibility for any >>> loss, damage or destruction of data or any other property which may arise >>> from relying on this email's technical content is explicitly disclaimed. The >>> author will in no case be liable for any monetary damages arising from such >>> loss, damage or destruction. >>> >>> >>> >>> >>> On 1 August 2016 at 23:02, Jacek Laskowski wrote: >>>> >>>> Hi, >>>> >>>> Interesting... >>>> >>>> I'm temping to think that substring function should accept the columns >>>> that hold the numbers for start and end. I'd love hearing people's >>>> thought on this. >>>> >>>> For now, I'd say you need to define udf to do substring as follows: >>>> >>>> scala> val mySubstr = udf { (s: String, start: Int, end: Int) => >>>> s.substring(start, end) } >>>> mySubstr: org.apache.spark.sql.expressions.UserDefinedFunction = >>>> UserDefinedFunction(,StringType,Some(List(StringType, >>>> IntegerType, IntegerType))) >>>> >>>> scala> df.show >>>> +-----------+ >>>> | s| >>>> +-----------+ >>>> |hello world| >>>> +-----------+ >>>> >>>> scala> df.select(mySubstr('s, lit(1), instr('s, "ll"))).show >>>> +-----------------------+ >>>> |UDF(s, 1, instr(s, ll))| >>>> +-----------------------+ >>>> | el| >>>> +-----------------------+ >>>> >>>> Pozdrawiam, >>>> Jacek Laskowski >>>> ---- >>>> https://medium.com/@jaceklaskowski/ >>>> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >>>> Follow me at https://twitter.com/jaceklaskowski >>>> >>>> >>>> On Mon, Aug 1, 2016 at 11:18 PM, Mich Talebzadeh >>>> wrote: >>>> > Thanks Jacek, >>>> > >>>> > Do I have any other way of writing this with functional programming? >>>> > >>>> > select >>>> > >>>> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2), >>>> > >>>> > >>>> > Cheers, >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > Dr Mich Talebzadeh >>>> > >>>> > >>>> > >>>> > LinkedIn >>>> > >>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> > >>>> > >>>> > >>>> > http://talebzadehmich.wordpress.com >>>> > >>>> > >>>> > Disclaimer: Use it at your own risk. Any and all responsibility for >>>> > any >>>> > loss, damage or destruction of data or any other property which may >>>> > arise >>>> > from relying on this email's technical content is explicitly >>>> > disclaimed. The >>>> > author will in no case be liable for any monetary damages arising from >>>> > such >>>> > loss, damage or destruction. >>>> > >>>> > >>>> > >>>> > >>>> > On 1 August 2016 at 22:13, Jacek Laskowski wrote: >>>> >> >>>> >> Hi Mich, >>>> >> >>>> >> There's no indexOf UDF - >>>> >> >>>> >> >>>> >> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$ >>>> >> >>>> >> >>>> >> Pozdrawiam, >>>> >> Jacek Laskowski >>>> >> ---- >>>> >> https://medium.com/@jaceklaskowski/ >>>> >> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark >>>> >> Follow me at https://twitter.com/jaceklaskowski >>>> >> >>>> >> >>>> >> On Mon, Aug 1, 2016 at 7:24 PM, Mich Talebzadeh >>>> >> wrote: >>>> >> > Hi, >>>> >> > >>>> >> > What is the equivalent of FP for the following window/analytic that >>>> >> > works OK >>>> >> > in Spark SQL >>>> >> > >>>> >> > This one using INSTR >>>> >> > >>>> >> > select >>>> >> > >>>> >> > >>>> >> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2), >>>> >> > >>>> >> > >>>> >> > select distinct * >>>> >> > from ( >>>> >> > select >>>> >> > >>>> >> > >>>> >> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2), >>>> >> > SUM(debitamount) OVER (PARTITION BY >>>> >> > >>>> >> > >>>> >> > substring(transactiondescription,1,INSTR(transactiondescription,'CD')-2)) AS >>>> >> > spent >>>> >> > from accounts.ll_18740868 where transactiontype = 'DEB' >>>> >> > ) tmp >>>> >> > >>>> >> > >>>> >> > I tried indexOf but it does not work! >>>> >> > >>>> >> > val wSpec2 = >>>> >> > >>>> >> > >>>> >> > Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD"))) >>>> >> > :26: error: not found: value indexOf >>>> >> > val wSpec2 = >>>> >> > >>>> >> > >>>> >> > Window.partitionBy(substring(col("transactiondescription"),1,indexOf(col("transactiondescription"),"CD"))) >>>> >> > >>>> >> > >>>> >> > Thanks >>>> >> > >>>> >> > Dr Mich Talebzadeh >>>> >> > >>>> >> > >>>> >> > >>>> >> > LinkedIn >>>> >> > >>>> >> > >>>> >> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>>> >> > >>>> >> > >>>> >> > >>>> >> > http://talebzadehmich.wordpress.com >>>> >> > >>>> >> > >>>> >> > Disclaimer: Use it at your own risk. Any and all responsibility for >>>> >> > any >>>> >> > loss, damage or destruction of data or any other property which may >>>> >> > arise >>>> >> > from relying on this email's technical content is explicitly >>>> >> > disclaimed. >>>> >> > The >>>> >> > author will in no case be liable for any monetary damages arising >>>> >> > from >>>> >> > such >>>> >> > loss, damage or destruction. >>>> >> > >>>> >> > >>>> > >>>> > >>> >>> >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscribe@spark.apache.org