You may be running into this issue:
https://issues.apache.org/jira/browse/SPARK-4019

You could check by having 2000 or fewer reduce partitions.

On Wed, Oct 22, 2014 at 1:48 PM, DB Tsai <dbtsai@dbtsai.com> wrote:
PS, sorry for spamming the mailing list. Based my knowledge, both
spark.shuffle.spill.compress and spark.shuffle.compress are default to
true, so in theory, we should not run into this issue if we don't
change any setting. Is there any other big we run into?

Thanks.

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Wed, Oct 22, 2014 at 1:37 PM, DB Tsai <dbtsai@dbtsai.com> wrote:
> Or can it be solved by setting both of the following setting into true for now?
>
> spark.shuffle.spill.compress true
> spark.shuffle.compress ture
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Wed, Oct 22, 2014 at 1:34 PM, DB Tsai <dbtsai@dbtsai.com> wrote:
>> It seems that this issue should be addressed by
>> https://github.com/apache/spark/pull/2890 ? Am I right?
>>
>> Sincerely,
>>
>> DB Tsai
>> -------------------------------------------------------
>> My Blog: https://www.dbtsai.com
>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>
>>
>> On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai <dbtsai@dbtsai.com> wrote:
>>> Hi all,
>>>
>>> With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but
>>> I've another exception now. I've no clue about what's going on; does
>>> anyone run into similar issue? Thanks.
>>>
>>> This is the configuration I use.
>>> spark.rdd.compress true
>>> spark.shuffle.consolidateFiles true
>>> spark.shuffle.manager SORT
>>> spark.akka.frameSize 128
>>> spark.akka.timeout  600
>>> spark.core.connection.ack.wait.timeout  600
>>> spark.core.connection.auth.wait.timeout 300
>>>
>>> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
>>>         java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
>>>         java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
>>>         java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
>>>         org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:57)
>>>         org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:57)
>>>         org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95)
>>>         org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351)
>>>         org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
>>>         org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
>>>         org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
>>>         org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
>>>         scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>>         org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>>>         org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>>         org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)
>>>         org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
>>>         org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
>>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>         org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>         org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>         org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>         org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>>>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         java.lang.Thread.run(Thread.java:744)
>>>
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> -------------------------------------------------------
>>> My Blog: https://www.dbtsai.com
>>> LinkedIn: https://www.linkedin.com/in/dbtsai

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org