spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Warren Kim <w...@diablo-technologies.com>
Subject Cached Tables SQL Performance Worse than Uncached
Date Thu, 15 Dec 2016 22:14:45 GMT
Playing with TPC-H and comparing performance between cached (serialized in-memory tables) and
uncached (DF from parquet) results in various SQL queries performing much worse, duration-wise.


I see some physical plans have an extra layer of shuffle/sort/merge under cached scenario.


I could do some filtering by key to optimize, but I'm just curious as to why out-of-the-box
planning is more complex and slower when tables are cached to mem.


Thanks!

Mime
View raw message