spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mathias <math...@socialsignificance.co.uk>
Subject Re: Performance problems on SQL JOIN
Date Fri, 20 Jun 2014 17:04:37 GMT
Thanks for your suggestions.

file.count() takes 7s, so that doesn't seem to be the problem.
Moreover, a union with the same code/CSV takes about 15s (SELECT * FROM
rooms2 UNION SELECT * FROM rooms3).

The web status page shows that both stages 'count at joins.scala:216' and
'reduce at joins.scala:219' take up the majority of the time.
Is this due to bad partitioning or caching? Or is there a problem with the
JOIN operator?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-problems-on-SQL-JOIN-tp8001p8016.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Mime
View raw message