Thanks Shamuel for trying out sparklens!

Couple of things that I noticed:
1) 250 executors is probably overkill for this job. It would run in same time with around 100.
2) Many of stages that take long time have only 200 tasks where as we have 750 cores available for the job. 200 is the default value for spark.sql.shuffle.partitions.  Alternatively you could try increasing the value of spark.sql.shuffle.partitions to latest 750. 

thanks,
rohitk

On Sun, Mar 25, 2018 at 1:25 PM, Shmuel Blitz <shmuel.blitz@similarweb.com> wrote:
I ran it on a single job.
SparkLens has an overhead on the job duration. I'm not ready to enable it by default on all our jobs.

Attached is the output.

Still trying to understand what exactly it means.

On Sun, Mar 25, 2018 at 10:40 AM, Fawze Abujaber <fawzeaj@gmail.com> wrote:
Nice!

Shmuel, Were you able to run on a cluster level or for a specific job?

Did you configure it on the spark-default.conf?

On Sun, 25 Mar 2018 at 10:34 Shmuel Blitz <shmuel.blitz@similarweb.com> wrote:
Just to let you know, I have managed to run SparkLens on our cluster.

I switched to the spark_1.6 branch, and also compiled against the specific image of Spark we are using (cdh5.7.6).

Now I need to figure out what the output means... :P

Shmuel

On Fri, Mar 23, 2018 at 7:24 PM, Fawze Abujaber <fawzeaj@gmail.com> wrote:
Quick question:

how to add the  --jars /path/to/sparklens_2.11-0.1.0.jar to the spark-default conf, should it be using:

spark.driver.extraClassPath /path/to/sparklens_2.11-0.1.0.jar or i should use spark.jars option? anyone who could give an example how it should be, and if i the path for the jar should be an hdfs path as i'm using it in cluster mode.




On Fri, Mar 23, 2018 at 6:33 AM, Fawze Abujaber <fawzeaj@gmail.com> wrote:
Hi Shmuel,

Did you compile the code against the right branch for Spark 1.6.

I tested it and it looks working and now i'm testing the branch for a wide tests, Please use the branch for Spark 1.6

On Fri, Mar 23, 2018 at 12:43 AM, Shmuel Blitz <shmuel.blitz@similarweb.com> wrote:
Hi Rohit,

Thanks for sharing this great tool.
I tried running a spark job with the tool, but it failed with an IncompatibleClassChangeError Exception.

I have opened an issue on Github.(https://github.com/qubole/sparklens/issues/1)

Shmuel

On Thu, Mar 22, 2018 at 5:05 PM, Shmuel Blitz <shmuel.blitz@similarweb.com> wrote:
Thanks.

We will give this a try and report back.

Shmuel

On Thu, Mar 22, 2018 at 4:22 PM, Rohit Karlupia <rohitk@qubole.com> wrote:
Thanks everyone!
Please share how it works and how it doesn't. Both help.

Fawaze, just made few changes to make this work with spark 1.6. Can you please try building from branch spark_1.6 

thanks,
rohitk



On Thu, Mar 22, 2018 at 10:18 AM, Fawze Abujaber <fawzeaj@gmail.com> wrote:
It's super amazing .... i see it was tested on spark 2.0.0 and above, what about Spark 1.6 which is still part of Cloudera's main versions?

We have a vast Spark applications with version 1.6.0

On Thu, Mar 22, 2018 at 6:38 AM, Holden Karau <holden@pigscanfly.ca> wrote:
Super exciting! I look forward to digging through it this weekend.

On Wed, Mar 21, 2018 at 9:33 PM ☼ R Nair (रविशंकर नायर) <ravishankar.nair@gmail.com> wrote:
Excellent. You filled a missing link.

Best,
Passion

On Wed, Mar 21, 2018 at 11:36 PM, Rohit Karlupia <rohitk@qubole.com> wrote:
Hi, 

Happy to announce the availability of Sparklens as open source project. It helps in understanding the  scalability limits of spark applications and can be a useful guide on the path towards tuning applications for lower runtime or cost. 

Please clone from here: https://github.com/qubole/sparklens

thanks,
rohitk

PS: Thanks for the patience. It took couple of months to get back on this. 





--





--
Shmuel Blitz
Big Data Developer
Email: shmuel.blitz@similarweb.com
www.similarweb.com



--
Shmuel Blitz
Big Data Developer
Email: shmuel.blitz@similarweb.com
www.similarweb.com





--
Shmuel Blitz
Big Data Developer
Email: shmuel.blitz@similarweb.com
www.similarweb.com



--
Shmuel Blitz
Big Data Developer
Email: shmuel.blitz@similarweb.com
www.similarweb.com