spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yann Luppo <>
Subject performance
Date Wed, 08 Jan 2014 21:49:21 GMT

I have what I hope is a simple question. What's a typical approach to diagnostic performance
issues on a Spark cluster?
We've followed all the pertinent parts of the following document already:
But we seem to still have issues. More specifically we have a leftouterjoin followed by a
flatmap and then a collect running a bit long.

How would I go about determining the bottleneck operation(s) ?
Is our leftouterjoin taking a long time?
Is the function we send to the flatmap not optimized?


View raw message