spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gurvinder Singh <>
Subject Re: SQLCtx cacheTable
Date Fri, 01 Aug 2014 06:26:46 GMT
Thanks Michael for explaination. Actually I tried caching the RDD and
making table on it. But the performance for cacheTable was 3X better
than caching RDD. Now I know why it is better. But is it possible to
add the support for persistence level into cacheTable itself like RDD.
May be it is not related, but on the same size of data set, when I use
cacheTable I have to specify larger executor memory than I need in
case of caching RDD. Although in the storage tab on status web UI, the
memory footprint is almost same 58.3 GB in cacheTable and 59.7GB in
cache RDD. Is it possible that there is some memory leak or cacheTable
works differently and thus require higher memory. The difference is
5GB per executor for the dataset of size 122 GB.

On 08/01/2014 04:42 AM, Michael Armbrust wrote:
> cacheTable uses a special columnar caching technique that is
> optimized for SchemaRDDs.  It something similar to MEMORY_ONLY_SER
> but not quite. You can specify the persistence level on the
> SchemaRDD itself and register that as a temporary table, however it
> is likely you will not get as good performance.
> On Thu, Jul 31, 2014 at 6:16 AM, Gurvinder Singh 
> < <>>
> wrote:
> Hi,
> I am wondering how can I specify the persistence level in
> cacheTable. As it is takes only table name as parameter. It should
> be possible to specify the persistence level.
> - Gurvinder

View raw message