spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: SQLCtx cacheTable
Date Sun, 03 Aug 2014 00:33:45 GMT
I am not a mesos expert... but it sounds like there is some mismatch
between the size that mesos is giving you and the maximum heap size of the
executors (-Xmx).


On Fri, Aug 1, 2014 at 12:07 AM, Gurvinder Singh <gurvinder.singh@uninett.no
> wrote:

> It is not getting out of memory exception. I am using Mesos as cluster
> manager and it says when I use cacheTable that the container has used
> all of its allocated memory and thus kill it. I can see it in the logs
> on mesos-slave where executor runs. But on the web UI of spark
> application, it shows that is still have 4-5GB space left for
> caching/storing. So I am wondering how the memory is handled in
> cacheTable case. Does it reserve the memory storage and other parts run
> out of their memory. I also tries to change the
> "spark.storage.memoryFraction" but that did not help.
>
> - Gurvinder
> On 08/01/2014 08:42 AM, Michael Armbrust wrote:
> > Are you getting OutOfMemoryExceptions with cacheTable? or what do you
> > mean when you say you have to specify larger executor memory?  You might
> > be running into SPARK-2650
> > <https://issues.apache.org/jira/browse/SPARK-2650>.
> >
> > Is there something else you are trying to accomplish by setting the
> > persistence level?  If you are looking for something like DISK_ONLY you
> > can simulate that now using saveAsParquetFile and parquetFile.
> >
> > It is possible long term that we will automatically map the standard RDD
> > persistence levels to these more efficient implementations in the future.
> >
> >
> > On Thu, Jul 31, 2014 at 11:26 PM, Gurvinder Singh
> > <gurvinder.singh@uninett.no <mailto:gurvinder.singh@uninett.no>> wrote:
> >
> >     Thanks Michael for explaination. Actually I tried caching the RDD and
> >     making table on it. But the performance for cacheTable was 3X better
> >     than caching RDD. Now I know why it is better. But is it possible to
> >     add the support for persistence level into cacheTable itself like
> RDD.
> >     May be it is not related, but on the same size of data set, when I
> use
> >     cacheTable I have to specify larger executor memory than I need in
> >     case of caching RDD. Although in the storage tab on status web UI,
> the
> >     memory footprint is almost same 58.3 GB in cacheTable and 59.7GB in
> >     cache RDD. Is it possible that there is some memory leak or
> cacheTable
> >     works differently and thus require higher memory. The difference is
> >     5GB per executor for the dataset of size 122 GB.
> >
> >     Thanks,
> >     Gurvinder
> >     On 08/01/2014 04:42 AM, Michael Armbrust wrote:
> >     > cacheTable uses a special columnar caching technique that is
> >     > optimized for SchemaRDDs.  It something similar to MEMORY_ONLY_SER
> >     > but not quite. You can specify the persistence level on the
> >     > SchemaRDD itself and register that as a temporary table, however it
> >     > is likely you will not get as good performance.
> >     >
> >     >
> >     > On Thu, Jul 31, 2014 at 6:16 AM, Gurvinder Singh
> >     > <gurvinder.singh@uninett.no <mailto:gurvinder.singh@uninett.no>
> >     <mailto:gurvinder.singh@uninett.no <mailto:
> gurvinder.singh@uninett.no>>>
> >     > wrote:
> >     >
> >     > Hi,
> >     >
> >     > I am wondering how can I specify the persistence level in
> >     > cacheTable. As it is takes only table name as parameter. It should
> >     > be possible to specify the persistence level.
> >     >
> >     > - Gurvinder
> >     >
> >     >
> >
> >
>
>

Mime
View raw message