spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Wan <shawn...@gmail.com>
Subject filter rows by all columns
Date Mon, 16 Jan 2017 22:27:03 GMT
I need to filter out outliers from a dataframe by all columns. I can
manually list all columns like:

df.filter(x=>math.abs(x.get(0).toString().toDouble-means(0))<=3*stddevs(0))

    .filter(x=>math.abs(x.get(1).toString().toDouble-means(1))<=3*stddevs(1
))

    ...

But I want to turn it into a general function which can handle variable
number of columns. How could I do that? Thanks in advance!


Regards,

Shawn




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/filter-rows-by-all-columns-tp28309.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Mime
View raw message