spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yong Zhang <>
Subject Great performance improvement of Spark 1.6.2 on our production cluster
Date Mon, 29 Aug 2016 17:16:32 GMT
Today I deployed Spark 1.6.2 on our production cluster.

There is one daily huge job we run it every day using Spark SQL, and it is the biggest Spark
job on our cluster running daily. I was impressive by the speed improvement.

Here is the history statistics of this daily job:

1) 11 to 12 hours on Hive 0.12 using MR
2) 6 hours on Spark 1.3.1
3) 4.5 hours on Spark 1.5.2

1.6 hours on Spark 1.6.2 with the same resource allocation (We are using Standalone mode).
Very hard to believe.

 Looking forward to the coming Spark 2.x release (Can you guys really make 10x faster? For
this job, 2x will already blow my heart).

Great job, Spark development team! Thank you for such great product.


View raw message