spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: work around Size exceeds Integer.MAX_VALUE
Date Thu, 09 Jul 2015 23:43:53 GMT
Thus means that one of your cached RDD partitions is bigger than 2 GB of data. You can fix
it by having more partitions. If you read data from a file system like HDFS or S3, set the
number of partitions higher in the sc.textFile, hadoopFile, etc methods (it's an optional
second parameter to those methods). If you create it through parallelize or if this particular
RDD comes from a shuffle, use more tasks in the parallelize or shuffle.

Matei

> On Jul 9, 2015, at 3:35 PM, Michal Čizmazia <micizma@gmail.com> wrote:
> 
> Spark version 1.4.0 in the Standalone mode
> 
> 2015-07-09 20:12:02 INFO  (sparkDriver-akka.actor.default-dispatcher-3) BlockManagerInfo:59
- Added rdd_0_0 on disk on localhost:51132 (size: 29.8 GB)
> 2015-07-09 20:12:02 ERROR (Executor task launch worker-0) Executor:96 - Exception in
task 0.0 in stage 0.0 (TID 0)
> java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)
>         at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)
>         at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:113)
>         at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
>         at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)
>         at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)
>         at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:509)
>         at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:427)
>         at org.apache.spark.storage.BlockManager.get(BlockManager.scala:615)
>         at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:154)
>         at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>         at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>         at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>         at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         at org.apache.spark.scheduler.Task.run(Task.scala:70)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 
> 
> On 9 July 2015 at 18:11, Ted Yu <yuzhihong@gmail.com <mailto:yuzhihong@gmail.com>>
wrote:
> Which release of Spark are you using ?
> 
> Can you show the complete stack trace ?
> 
> getBytes() could be called from:
>     getBytes(file, 0, file.length)
> or:
>     getBytes(segment.file, segment.offset, segment.length)
> 
> Cheers
> 
> On Thu, Jul 9, 2015 at 2:50 PM, Michal Čizmazia <micizma@gmail.com <mailto:micizma@gmail.com>>
wrote:
> Please could anyone give me pointers for appropriate SparkConf to work around "Size exceeds
Integer.MAX_VALUE"?
> 
> Stacktrace:
> 
> 2015-07-09 20:12:02 INFO  (sparkDriver-akka.actor.default-dispatcher-3) BlockManagerInfo:59
- Added rdd_0_0 on disk on localhost:51132 (size: 29.8 GB)
> 2015-07-09 20:12:02 ERROR (Executor task launch worker-0) Executor:96 - Exception in
task 0.0 in stage 0.0 (TID 0)
> java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)
>         at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStore.scala:125)
> ...
> 
> 
> 


Mime
View raw message