spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: spark streaming failing to replicate blocks
Date Fri, 23 Oct 2015 13:59:08 GMT
If you can reproduce, then i think you can open up a jira for this.

Thanks
Best Regards

On Fri, Oct 23, 2015 at 1:37 PM, Eugen Cepoi <cepoi.eugen@gmail.com> wrote:

> When fixing the port to the same values as in the stack trace it works
> too. The network config of the slaves seems correct.
>
> Thanks,
> Eugen
>
> 2015-10-23 8:30 GMT+02:00 Akhil Das <akhil@sigmoidanalytics.com>:
>
>> Mostly a network issue, you need to check your network configuration from
>> the aws console and make sure the ports are accessible within the cluster.
>>
>> Thanks
>> Best Regards
>>
>> On Thu, Oct 22, 2015 at 8:53 PM, Eugen Cepoi <cepoi.eugen@gmail.com>
>> wrote:
>>
>>> Huh indeed this worked, thanks. Do you know why this happens, is that
>>> some known issue?
>>>
>>> Thanks,
>>> Eugen
>>>
>>> 2015-10-22 19:08 GMT+07:00 Akhil Das <akhil@sigmoidanalytics.com>:
>>>
>>>> Can you try fixing spark.blockManager.port to specific port and see if
>>>> the issue exists?
>>>>
>>>> Thanks
>>>> Best Regards
>>>>
>>>> On Mon, Oct 19, 2015 at 6:21 PM, Eugen Cepoi <cepoi.eugen@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN.
>>>>> The job is reading data from Kinesis and the batch size is of 30s (I
>>>>> used the same value for the kinesis checkpointing).
>>>>> In the executor logs I can see every 5 seconds a sequence of
>>>>> stacktraces indicating that the block replication failed. I am using
the
>>>>> default storage level MEMORY_AND_DISK_SER_2.
>>>>> WAL is not enabled nor checkpointing (the checkpoint dir is configured
>>>>> for the spark context but not for the streaming context).
>>>>>
>>>>> Here is an example of those logs for ip-10-63-160-18. They occur in
>>>>> every executor while trying to replicate to any other executor.
>>>>>
>>>>>
>>>>> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to
[ip-10-63-160-18.ec2.internal/10.63.160.18:50929]
>>>>> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection
to ip-10-63-160-18.ec2.internal/10.63.160.18:50929
>>>>> java.net.ConnectException: Connection refused
>>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending
message.
>>>>> java.net.ConnectException: Connection refused
>>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error
on connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>>>>> 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate input-1-1445242310000
to BlockManagerId(3, ip-10-159-151-22.ec2.internal, 50929), failure #0
>>>>> java.net.ConnectException: Connection refused
>>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection
to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>>>>> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to
[ip-10-63-160-18.ec2.internal/10.63.160.18:39506]
>>>>> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection
to ip-10-63-160-18.ec2.internal/10.63.160.18:39506
>>>>> java.net.ConnectException: Connection refused
>>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending
message.
>>>>> java.net.ConnectException: Connection refused
>>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506)
>>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error
on connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506)
>>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection
to ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506)
>>>>> 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate input-1-1445242310000
to BlockManagerId(2, ip-10-141-12-91.ec2.internal, 39506), failure #1
>>>>> java.net.ConnectException: Connection refused
>>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>>> 15/10/19 03:11:55 WARN storage.BlockManager: Block input-1-1445242310000
replicated to only 0 peer(s) instead of 1 peers
>>>>> 15/10/19 03:11:55 INFO receiver.BlockGenerator: Pushed block input-1-1445242310000
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Eugen
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message