spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Help understanding - Not enough space to cache rdd
Date Wed, 03 Dec 2014 08:51:42 GMT
Set spark.storage.memoryFraction flag to 1 while creating the sparkContext
to utilize upto 73Gb of your memory, default it 0.6 and hence you are
getting 33.6Gb. Also set rdd.compression and StorageLevel as
MEMORY_ONLY_SER if your data is kind of larger than your available memory.
(you could try MEMORY_AND_DISK_SER also)




Thanks
Best Regards

On Wed, Dec 3, 2014 at 12:23 AM, akhandeshi <ami.khandeshi@gmail.com> wrote:

> I am running in local mode. I am using google n1-highmem-16 (16 vCPU, 104
> GB
> memory) machine.
>
> I have allocated the SPARK_DRIVER_MEMORY=95g
>
> I see Memory: 33.6 GB Used (73.7 GB Total) that the exeuctor is using.
>
> In the log out put below, I see 33.6 gb blocks are used by 2 rdds that I
> have cached.   I should still have 40.2 gb left.
>
> However, I see  messages like:
>
> 14/12/02 18:15:04 WARN storage.MemoryStore: Not enough space to cache
> rdd_15_9 in memory! (computed 8.1 GB so far)
> 14/12/02 18:15:04 INFO storage.MemoryStore: Memory use = 33.6 GB (blocks) +
> 40.1 GB (scratch space shared across 14 thread(s)) = 73.7 GB. Storage limit
> = 73.7 GB.
> 14/12/02 18:15:04 WARN spark.CacheManager: Persisting partition rdd_15_9 to
> disk instead.
> .
> .
> .
> .
> further down I see:
> 4/12/02 18:30:08 INFO storage.BlockManagerInfo: Added rdd_15_9 on disk on
> localhost:41889 (size: 6.9 GB)
> 4/12/02 18:30:08 INFO storage.BlockManagerMaster: Updated info of block
> rdd_15_9
> 14/12/02 18:30:08 ERROR executor.Executor: Exception in task 9.0 in stage
> 2.0 (TID 348)
> java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>
> I don't understand couple of things:
> 1) In this case, I am joining 2 RDDs (size 16.3 G and 17.2 GB) both rdds
> are
> create from reading from HDFS files.  The size of each .part is 24.87 MB, I
> am reading this files into 250 partitions, so I shouldn't have any
> individual partition over 25MB, so how could rdd_15_9 have 8.1g?
>
> 2) Even if the data is 8.1g, spark should have enough memory to write, but
> I
> would expect Integer.MAX_VALUE  2gb limitation!   However, I don't get that
> error message, and partial dataset is written to disk (6.9 gb).  I don't
> understand how and why only partial dataset is written.
>
> 3)  Why do get "java.lang.IllegalArgumentException: Size exceeds
> Integer.MAX_VALUE" after writing partial dataset.
>
> I would love to hear from anyone that can shed some light into this...
>
>
> None
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-tp20186.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message