spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cheng Lian (JIRA)" <>
Subject [jira] [Commented] (SPARK-2650) Wrong initial sizes for in-memory column buffers
Date Mon, 04 Aug 2014 19:04:15 GMT


Cheng Lian commented on SPARK-2650:

Some additional comments after more experiments and some improvements:

# How exactly the OOMs occur when caching a large table (assume N cores and M memory are available
within a single executor):
#- Say the table is so large that the underlying RDD is divided into X partitions (usually
X >> N, let's assume this here)
#- When caching the table, N tasks are executed in parallel, building column buffers, each
of them is memory consuming. Say, each task consumes Y memory in average.
#- At some point, memory consumptions of all N parallel tasks altogether, namely N * Y exceeds
available memory of the executor, an OOM is thrown
#- All tasks fail and retry, but fail again, until the driver stops retrying, and the job
#- I guess the reason that this issue hasn't been reported is that, usually M > N * Y holds
in production.
# Initial buffer sizes do affect the OOMs in a subtle way:
#- Too large an initial buffer size implies, apparently, larger memory consumption
#- Too small an initial buffer size causes the {{ColumnBuilder}} keeps allocating larger buffers
to ensure enough free space to hold more elements (12.5% larger at a time). Thus 212.5% larger
space is required to finish growing a buffer (112.5% for the new buffer + 100% for the original
#- A well estimated initial buffer size should be 1) large enough to avoid buffer growing,
and 2) small enough to avoid memory waste. For example, by hand tuning, 5MB can be a good
initial size for an executor with 512M memory and 1 core.
# An apparent approach to help fixing this issue is to try reducing the memory consumption
during the column building process.
#- [PR #1769|] is submitted to reduce memory consumption
of the column building proces.
# Another approach is to estimate the initial buffer size. To do this, Shark uses an estimated
table partition size by leveraging HDFS block size and column element default size. We can
use similar approach in Spark SQL for Hive tables, and some configurable initial size for
non-Hive tables.
#- Currently {{InMemoryRelation}} resides in package {{org.apache.spark.sql.columnar}} and
doesn't know anything about Hive tables. We can add an {{estimatedPartitionSize}} method,
and override it in a new {{InMemoryMetastoreRelation}} to estimate RDD partition sizes of
a Hive table. This will be done in another separate PR.

> Wrong initial sizes for in-memory column buffers
> ------------------------------------------------
>                 Key: SPARK-2650
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.0.0, 1.0.1
>            Reporter: Michael Armbrust
>            Assignee: Cheng Lian
>            Priority: Critical
> The logic for setting up the initial column buffers is different for Spark SQL compared
to Shark and I'm seeing OOMs when caching tables that are larger than available memory (where
shark was okay).
> Two suspicious things: the intialSize is always set to 0 so we always go with the default.
 The default looks like it was copied from code like 10 * 1024 * 1024... but in Spark SQL
its 10 * 102 * 1024.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message