spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Som Lima <somplastic...@gmail.com>
Subject Re: Filtering on multiple columns in spark
Date Wed, 29 Apr 2020 08:30:06 GMT
>From your email the obvious seems to be that
10  is an Int because it is not surrounded in quotes ""
10 should be "10".

Although I can't image a telephone number with only 10 because that is what
you are trying to program.


In *Scala*, you can check *if *two operands *are equal* ( == ) or *not* (
!= ) *and* it returns true *if* the condition *is* met, false *if not* (
else ). By itself, ! *is *called the Logical *NOT* Operator.

On Wed, 29 Apr 2020, 08:45 Mich Talebzadeh, <mich.talebzadeh@gmail.com>
wrote:

> Hi,
>
>
>
> Trying to filter a dataframe with multiple conditions using OR "||" as
> below
>
>
>
>   val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> This throws this error
>
>
>
> res12: org.apache.spark.sql.DataFrame = []
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !== 10 ||
> substring(col("target_mobile_no"),1,1) !== "7")
>
>
>
> Try another way
>
>
>
> val rejectedDF = newDF.withColumn("target_mobile_no",
> col("target_mobile_no").cast(StringType)).
>
>                    filter(length(col("target_mobile_no")) !=== 10 ||
> substring(col("target_mobile_no"),1,1) !=== "7")
>
>   rejectedDF.createOrReplaceTempView("tmp")
>
>
>
> Tried few options but I am still getting this error
>
>
>
> <console>:49: error: value !=== is not a member of
> org.apache.spark.sql.Column
>
>                           filter(length(col("target_mobile_no")) !=== 10
> || substring(col("target_mobile_no"),1,1) !=== "7")
>
>                                                                  ^
>
> <console>:49: error: value || is not a member of Int
>
>                           filter(length(col("target_mobile_no")) !=== 10
> || substring(col("target_mobile_no"),1,1) !=== "7")
>
>
>
> I can create a dataframe for each filter but that does not look efficient
> to me?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Mime
View raw message