spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From patcharee <>
Subject spark 1.5 sort slow
Date Tue, 01 Sep 2015 08:06:53 GMT

I found spark 1.5 sorting is very slow compared to spark 1.4. Below is 
my code snippet

     val sqlRDD = sql("select date, u, v, z from fino3_hr3 where zone == 
2 and z >= 2 and z <= order by date, z")
     println("sqlRDD " + sqlRDD.count())

The fino3_hr3 (in the sql command) is a hive table in orc format, 
partitioned by zone and z.

Spark 1.5 takes 4.5 mins to execute this sql, while spark 1.4 takes 1.5 
mins. I noticed that dissimilar to spark 1.4 when spark 1.5 sorted, data 
was shuffled into few tasks, not divided for all tasks. Do I need to set 
any configuration explicitly? Any suggestions?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message