spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anu <>
Subject Iterate over contents of schemaRDD loaded from parquet file to extract timestamp
Date Tue, 17 Mar 2015 04:43:09 GMT
Spark Version - 1.1.0
Scala - 2.10.4

I have loaded following type data from a parquet file, stored in a schemaRDD

[7654321,2015-01-01 00:00:00.007,0.49,THU]

Since, in spark version 1.1.0, parquet format doesn't support saving
timestamp valuues, I have saved the timestamp data as string. Can you please
tell me how to iterate over the data in this schema RDD to retrieve the 
timestamp values and regsietr the mapped RDD as a Table and then be able to
run queries like "Select * from table where time >= '2015-01-01
00:00:00.000' " . I wrote the following code :

val sdf = new SimpleDateFormat("yyyy-mm-dd hh:mm:ss.SSS"); val calendar =
val iddRDD ={ r => 

val end_time = sdf.parse(r(1).toString); 
val r1 = new java.sql.Timestamp(end_time.getTime); 

val hour: Long = calendar.get(Calendar.HOUR_OF_DAY); 

Row(r(0).toString.toInt, r1, hour, r(2).toString.toInt, r(3).toString)


This gives me * org.apache.spark.SparkException: Task not serializable*

Please help !!!

View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message