spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eugen Cepoi <cepoi.eu...@gmail.com>
Subject Re: spark streaming failing to replicate blocks
Date Fri, 23 Oct 2015 08:07:12 GMT
When fixing the port to the same values as in the stack trace it works too.
The network config of the slaves seems correct.

Thanks,
Eugen

2015-10-23 8:30 GMT+02:00 Akhil Das <akhil@sigmoidanalytics.com>:

> Mostly a network issue, you need to check your network configuration from
> the aws console and make sure the ports are accessible within the cluster.
>
> Thanks
> Best Regards
>
> On Thu, Oct 22, 2015 at 8:53 PM, Eugen Cepoi <cepoi.eugen@gmail.com>
> wrote:
>
>> Huh indeed this worked, thanks. Do you know why this happens, is that
>> some known issue?
>>
>> Thanks,
>> Eugen
>>
>> 2015-10-22 19:08 GMT+07:00 Akhil Das <akhil@sigmoidanalytics.com>:
>>
>>> Can you try fixing spark.blockManager.port to specific port and see if
>>> the issue exists?
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Oct 19, 2015 at 6:21 PM, Eugen Cepoi <cepoi.eugen@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am running spark streaming 1.4.1 on EMR (AMI 3.9) over YARN.
>>>> The job is reading data from Kinesis and the batch size is of 30s (I
>>>> used the same value for the kinesis checkpointing).
>>>> In the executor logs I can see every 5 seconds a sequence of
>>>> stacktraces indicating that the block replication failed. I am using the
>>>> default storage level MEMORY_AND_DISK_SER_2.
>>>> WAL is not enabled nor checkpointing (the checkpoint dir is configured
>>>> for the spark context but not for the streaming context).
>>>>
>>>> Here is an example of those logs for ip-10-63-160-18. They occur in
>>>> every executor while trying to replicate to any other executor.
>>>>
>>>>
>>>> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to [ip-10-63-160-18.ec2.internal/10.63.160.18:50929]
>>>> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection
to ip-10-63-160-18.ec2.internal/10.63.160.18:50929
>>>> java.net.ConnectException: Connection refused
>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending message.
>>>> java.net.ConnectException: Connection refused
>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error on
connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>>>> 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate input-1-1445242310000
to BlockManagerId(3, ip-10-159-151-22.ec2.internal, 50929), failure #0
>>>> java.net.ConnectException: Connection refused
>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection
to ConnectionManagerId(ip-10-63-160-18.ec2.internal,50929)
>>>> 15/10/19 03:11:55 INFO nio.SendingConnection: Initiating connection to [ip-10-63-160-18.ec2.internal/10.63.160.18:39506]
>>>> 15/10/19 03:11:55 WARN nio.SendingConnection: Error finishing connection
to ip-10-63-160-18.ec2.internal/10.63.160.18:39506
>>>> java.net.ConnectException: Connection refused
>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> 15/10/19 03:11:55 ERROR nio.ConnectionManager: Exception while sending message.
>>>> java.net.ConnectException: Connection refused
>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Notifying ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506)
>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Handling connection error on
connection to ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506)
>>>> 15/10/19 03:11:55 INFO nio.ConnectionManager: Removing SendingConnection
to ConnectionManagerId(ip-10-63-160-18.ec2.internal,39506)
>>>> 15/10/19 03:11:55 WARN storage.BlockManager: Failed to replicate input-1-1445242310000
to BlockManagerId(2, ip-10-141-12-91.ec2.internal, 39506), failure #1
>>>> java.net.ConnectException: Connection refused
>>>> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>>>> 	at org.apache.spark.network.nio.SendingConnection.finishConnect(Connection.scala:344)
>>>> 	at org.apache.spark.network.nio.ConnectionManager$$anon$10.run(ConnectionManager.scala:292)
>>>> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>> 	at java.lang.Thread.run(Thread.java:745)
>>>> 15/10/19 03:11:55 WARN storage.BlockManager: Block input-1-1445242310000
replicated to only 0 peer(s) instead of 1 peers
>>>> 15/10/19 03:11:55 INFO receiver.BlockGenerator: Pushed block input-1-1445242310000
>>>>
>>>>
>>>>
>>>> Thanks,
>>>> Eugen
>>>>
>>>
>>>
>>
>

Mime
View raw message