spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shailesh Birari <sbir...@wynyardgroup.com>
Subject Re: Spark SQL takes unexpected time
Date Tue, 04 Nov 2014 01:52:50 GMT
Yes, I am using Spark1.1.0 and have used rdd.registerTempTable().
I tried by adding sqlContext.cacheTable(), but it took 59 seconds (more than
earlier).

I also tried by changing schema to use Long data type in some fields but
seems conversion takes more time. 
Is there any way to specify index ?  Though I checked and didn't found any,
just want to confirm.

For your reference here is the snippet of code.

-----------------------------------------------------------------------------------------------------------------
case class EventDataTbl(EventUID: Long, 
		ONum: Long,
		RNum: Long,
		Timestamp: java.sql.Timestamp,
		Duration: String,
		Type: String,
		Source: String,
		OName: String,
		RName: String)

		val format = new java.text.SimpleDateFormat("yyyy-MM-dd hh:mm:ss")
		val cedFileName = "hdfs://hadoophost:8020/demo/poc/JoinCsv/output_2"
		val cedRdd = sc.textFile(cedFileName).map(_.split(",", -1)).map(p =>
EventDataTbl(p(0).toLong, p(1).toLong, p(2).toLong, new
java.sql.Timestamp(format.parse(p(3)).getTime()), p(4), p(5), p(6), p(7),
p(8)))

		cedRdd.registerTempTable("EventDataTbl")
		sqlCntxt.cacheTable("EventDataTbl")
		
		val t1 = System.nanoTime()
		println("\n\n10 Most frequent conversations between the Originators and
Recipients\n")
		sql("SELECT COUNT(*) AS Frequency,ONum,OName,RNum,RName FROM EventDataTbl
GROUP BY ONum,OName,RNum,RName ORDER BY Frequency DESC LIMIT
10").collect().foreach(println)
		val t2 = System.nanoTime()
		println("Time taken " + (t2-t1)/1000000000.0 + " Seconds")

-----------------------------------------------------------------------------------------------------------------

Thanks,
  Shailesh



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-takes-unexpected-time-tp17925p18017.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message