spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sequoiadb <mailing-list-r...@sequoiadb.com>
Subject question about sparksql caching
Date Fri, 15 May 2015 03:02:22 GMT
Hi all,

We are planing to use SparkSQL in a DW system. There’s a question about the caching mechanism
of SparkSQL.

For example, if I have a SQL like sqlContext.sql(“select c1, sum(c2) from T1, T2 where T1.key=T2.key
group by c1”).cache()

Is it going to cache the final result or the raw data of each table that used in the SQL?

Since the user may have various of SQLs that use those tables, if the caching is for the final
result only, it may still take very long time to scan the entire table if it’s a brand new
SQL.

If this is the case, is there any other better way to cache the base tables instead of final
result?

Thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message