spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paolo Platter <paolo.plat...@agilelab.it>
Subject R: Spark is much slower than direct access MySQL
Date Sun, 26 Jul 2015 09:21:29 GMT
If you want a performance boost, you need to load the full table in memory using caching and
them execute your query directly on cached dataframe. Otherwise you use spark only as a bridge
and you don't leverage the distributed in memory engine of spark.

Paolo

Inviata dal mio Windows Phone
________________________________
Da: Louis Hust<mailto:louis.hust@gmail.com>
Inviato: ‎26/‎07/‎2015 10:28
A: Shixiong Zhu<mailto:zsxwing@gmail.com>
Cc: Jerrick Hoang<mailto:jerrickhoang@gmail.com>; user@spark.apache.org<mailto:user@spark.apache.org>
Oggetto: Re: Spark is much slower than direct access MySQL

Thanks for your explain

2015-07-26 16:22 GMT+08:00 Shixiong Zhu <zsxwing@gmail.com<mailto:zsxwing@gmail.com>>:
Oh, I see. That's the total time of executing a query in Spark. Then the difference is reasonable,
considering Spark has much more work to do, e.g., launching tasks in executors.


Best Regards,

Shixiong Zhu

2015-07-26 16:16 GMT+08:00 Louis Hust <louis.hust@gmail.com<mailto:louis.hust@gmail.com>>:
Look at the given url:

Code can be found at:

https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java

2015-07-26 16:14 GMT+08:00 Shixiong Zhu <zsxwing@gmail.com<mailto:zsxwing@gmail.com>>:
Could you clarify how you measure the Spark time cost? Is it the total time of running the
query? If so, it's possible because the overhead of Spark dominates for small queries.


Best Regards,

Shixiong Zhu

2015-07-26 15:56 GMT+08:00 Jerrick Hoang <jerrickhoang@gmail.com<mailto:jerrickhoang@gmail.com>>:
how big is the dataset? how complicated is the query?

On Sun, Jul 26, 2015 at 12:47 AM Louis Hust <louis.hust@gmail.com<mailto:louis.hust@gmail.com>>
wrote:
Hi, all,

I am using spark DataFrame to fetch small table from MySQL,
and i found it cost so much than directly access MySQL Using JDBC.

Time cost for Spark is about 2033ms, and direct access at about 16ms.

Code can be found at:

https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java

So If my configuration for spark is wrong? How to optimise Spark to achieve the similar performance
like direct access?

Any idea will be appreciated!






Mime
View raw message