spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject This works to filter transactions older than certain months
Date Sun, 27 Mar 2016 22:50:37 GMT
Hi,

A while back I was looking for functional programming to filter out
transactions older > n months etc.

This turned out to be pretty easy.

I get today's day as follows

var today = sqlContext.sql("SELECT FROM_unixtime(unix_timestamp(),
'yyyy-MM-dd') ").collect.apply(0).getString(0)


CSV data is stored in an underlying table in Hive (actually created and
populated as an ORC table by Spark)

HiveContext.sql("use accounts")
var n = HiveContext.table("nw_10124772")

scala> n.printSchema
root
 |-- transactiondate: date (nullable = true)
 |-- transactiontype: string (nullable = true)
 |-- description: string (nullable = true)
 |-- value: double (nullable = true)
 |-- balance: double (nullable = true)
 |-- accountname: string (nullable = true)
 |-- accountnumber: integer (nullable = true)

//
// Check for historical transactions > 60 months old
//
var old: Int = 60

val rs = n.filter(add_months(col("transactiondate"),old) <
lit(today)).select(lit(today),
col("transactiondate"),add_months(col("transactiondate"),old)).collect.foreach(println)

[2016-03-27,2011-03-22,2016-03-22]
[2016-03-27,2011-03-22,2016-03-22]
[2016-03-27,2011-03-22,2016-03-22]
[2016-03-27,2011-03-22,2016-03-22]
[2016-03-27,2011-03-23,2016-03-23]
[2016-03-27,2011-03-23,2016-03-23]


Which seems to work. Any other suggestions will be appreciated.

Thanks



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Mime
View raw message