spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edgardo Szrajber <szraj...@yahoo.com.INVALID>
Subject Re: Filtering on multiple columns in spark
Date Wed, 29 Apr 2020 15:45:54 GMT
Maybe create a column with "lit" function for the variables you are comparing against.Bentzi

Sent from Yahoo Mail on Android 
 
  On Wed, Apr 29, 2020 at 18:40, Mich Talebzadeh<mich.talebzadeh@gmail.com> wrote: 
 
The below line works    

 

valc = newDF.withColumn("target_mobile_no",col("target_mobile_no").cast(StringType)).

        filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1)!=
'7'")

 

 

But not the following when the values are passed as parameters

 

valrejectedDF = newDF.withColumn("target_mobile_no",col("target_mobile_no").cast(StringType)).

 filter("length(target_mobile_no) != broadcastStagingConfig.mobileNoLengthOR substring(target_mobile_no,1,1)
!= broadcastStagingConfig.ukMobileNoStart")

 

Ithink it cannot interpret them




Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com




Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or
destructionof data or any other property which may arise from relying on this email's technical content
is explicitly disclaimed.The author will in no case be liable for any monetary damages arising
from suchloss, damage or destruction. 

 


On Wed, 29 Apr 2020 at 13:25, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:

OK how do you pass variables for 10 and '7' 

 val rejectedDF = newDF.withColumn("target_mobile_no", col("target_mobile_no").cast(StringType)).

                   filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1)
!= '7'")

in above in Scala. Neither $ value below or lit() are working!

   val rejectedDF =newDF.withColumn("target_mobile_no",col("target_mobile_no").cast(StringType)).

                    filter("length(target_mobile_no) != ${broadcastStagingConfig.mobileNoLength}OR
substring(target_mobile_no,1,1) !=${broadcastStagingConfig.ukMobileNoStart}")




Thanks











Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com




Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or
destructionof data or any other property which may arise from relying on this email's technical content
is explicitly disclaimed.The author will in no case be liable for any monetary damages arising
from suchloss, damage or destruction. 

 


On Wed, 29 Apr 2020 at 10:15, Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:

Hi Zhang,
Yes the SQL way worked fine
  val rejectedDF =newDF.withColumn("target_mobile_no",col("target_mobile_no").cast(StringType)).

                  filter("length(target_mobile_no) != 10 OR substring(target_mobile_no,1,1)!=
'7'")

Many thanks,

Dr Mich Talebzadeh

 

LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

 

http://talebzadehmich.wordpress.com




Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or
destructionof data or any other property which may arise from relying on this email's technical content
is explicitly disclaimed.The author will in no case be liable for any monetary damages arising
from suchloss, damage or destruction. 

 


On Wed, 29 Apr 2020 at 09:51, ZHANG Wei <wezhang@outlook.com> wrote:

AFAICT, maybe Spark SQL built-in functions[1] can help as below:

scala> df.show()
+----+-------+
| age|   name|
+----+-------+
|null|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+


scala> df.filter("length(name) == 4 or substring(name, 1, 1) == 'J'").show()
+---+------+
|age|  name|
+---+------+
| 30|  Andy|
| 19|Justin|
+---+------+


-- 
Cheers,
-z
[1] https://spark.apache.org/docs/latest/api/sql/index.html

On Wed, 29 Apr 2020 08:45:26 +0100
Mich Talebzadeh <mich.talebzadeh@gmail.com> wrote:

> Hi,
> 
> 
> 
> Trying to filter a dataframe with multiple conditions using OR "||" as below
> 
> 
> 
>   val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
> 
>                    filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
> 
> 
> 
> This throws this error
> 
> 
> 
> res12: org.apache.spark.sql.DataFrame = []
> 
> <console>:49: error: value || is not a member of Int
> 
>                           filter(length(col("target_mobile_no")) !== 10
||
> substring(col("target_mobile_no"),1,1) !== "7")
> 
> 
> 
> Try another way
> 
> 
> 
> val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
> 
>                    filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
> 
>   rejectedDF.createOrReplaceTempView("tmp")
> 
> 
> 
> Tried few options but I am still getting this error
> 
> 
> 
> <console>:49: error: value !=== is not a member of
> org.apache.spark.sql.Column
> 
>                           filter(length(col("target_mobile_no")) !=== 10
||
> substring(col("target_mobile_no"),1,1) !=== "7")
> 
>                                                         
        ^
> 
> <console>:49: error: value || is not a member of Int
> 
>                           filter(length(col("target_mobile_no")) !=== 10
||
> substring(col("target_mobile_no"),1,1) !=== "7")
> 
> 
> 
> I can create a dataframe for each filter but that does not look efficient
> to me?
> 
> 
> 
> Thanks
> 
> 
> 
> Dr Mich Talebzadeh
> 
> 
> 
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
> 
> 
> 
> http://talebzadehmich.wordpress.com
> 
> 
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.



  

Mime
View raw message