spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DB Tsai <dbt...@dbtsai.com>
Subject Re: Shuffle issues in the current master
Date Wed, 22 Oct 2014 20:48:18 GMT
PS, sorry for spamming the mailing list. Based my knowledge, both
spark.shuffle.spill.compress and spark.shuffle.compress are default to
true, so in theory, we should not run into this issue if we don't
change any setting. Is there any other big we run into?

Thanks.

Sincerely,

DB Tsai
-------------------------------------------------------
My Blog: https://www.dbtsai.com
LinkedIn: https://www.linkedin.com/in/dbtsai


On Wed, Oct 22, 2014 at 1:37 PM, DB Tsai <dbtsai@dbtsai.com> wrote:
> Or can it be solved by setting both of the following setting into true for now?
>
> spark.shuffle.spill.compress true
> spark.shuffle.compress ture
>
> Sincerely,
>
> DB Tsai
> -------------------------------------------------------
> My Blog: https://www.dbtsai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Wed, Oct 22, 2014 at 1:34 PM, DB Tsai <dbtsai@dbtsai.com> wrote:
>> It seems that this issue should be addressed by
>> https://github.com/apache/spark/pull/2890 ? Am I right?
>>
>> Sincerely,
>>
>> DB Tsai
>> -------------------------------------------------------
>> My Blog: https://www.dbtsai.com
>> LinkedIn: https://www.linkedin.com/in/dbtsai
>>
>>
>> On Wed, Oct 22, 2014 at 11:54 AM, DB Tsai <dbtsai@dbtsai.com> wrote:
>>> Hi all,
>>>
>>> With SPARK-3948, the exception in Snappy PARSING_ERROR is gone, but
>>> I've another exception now. I've no clue about what's going on; does
>>> anyone run into similar issue? Thanks.
>>>
>>> This is the configuration I use.
>>> spark.rdd.compress true
>>> spark.shuffle.consolidateFiles true
>>> spark.shuffle.manager SORT
>>> spark.akka.frameSize 128
>>> spark.akka.timeout  600
>>> spark.core.connection.ack.wait.timeout  600
>>> spark.core.connection.auth.wait.timeout 300
>>>
>>> java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
>>>         java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
>>>         java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
>>>         java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
>>>         org.apache.spark.serializer.JavaDeserializationStream$$anon$1.<init>(JavaSerializer.scala:57)
>>>         org.apache.spark.serializer.JavaDeserializationStream.<init>(JavaSerializer.scala:57)
>>>         org.apache.spark.serializer.JavaSerializerInstance.deserializeStream(JavaSerializer.scala:95)
>>>         org.apache.spark.storage.BlockManager.getLocalShuffleFromDisk(BlockManager.scala:351)
>>>         org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
>>>         org.apache.spark.storage.ShuffleBlockFetcherIterator$$anonfun$fetchLocalBlocks$1$$anonfun$apply$4.apply(ShuffleBlockFetcherIterator.scala:196)
>>>         org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
>>>         org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
>>>         scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>>         org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>>>         org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>>>         org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:89)
>>>         org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
>>>         org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
>>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>         org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>>         org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>>         org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>>>         org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>>>         org.apache.spark.scheduler.Task.run(Task.scala:56)
>>>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>>>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         java.lang.Thread.run(Thread.java:744)
>>>
>>>
>>> Sincerely,
>>>
>>> DB Tsai
>>> -------------------------------------------------------
>>> My Blog: https://www.dbtsai.com
>>> LinkedIn: https://www.linkedin.com/in/dbtsai

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message