spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ping Tang <pt...@aerohive.com>
Subject Re: How to use FlumeInputDStream in spark cluster?
Date Mon, 01 Dec 2014 20:02:05 GMT
Thank you very much for your reply.

I have a cluster of 8 nodes: m1, m2, m3.. m8. m1 configured as Spark master node, the rest
of the nodes are all worker node. I also configured m3 as the History Server. But the history
server fails to start.I ran FlumeEventCount in m1 using the right hostname and a port that
is not used by any application. Here is the script I used to run FlumeEventCount:


#!/bin/bash


spark-submit --verbose --class org.apache.spark.examples.streaming.FlumeEventCount --deploy-mode
client --master yarn-client --jars lib/spark-streaming-flume_2.10-1.1.0-cdh5.2.2-20141112.193826-1.jar
/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark/examples/lib/spark-examples-1.1.0-cdh5.2.0-hadoop2.5.0-cdh5.2.0.jar
m1.ptang.aerohive.com 1234


Same issue observed after added  "spark.ui.port=4321" in  /etc/spark/conf/spark-defaults.conf.
Followings are the exceptions from the job run:


14/12/01 11:51:45 INFO JobScheduler: Finished job streaming job 1417463504000 ms.0 from job
set of time 1417463504000 ms

14/12/01 11:51:45 INFO JobScheduler: Total delay: 1.465 s for time 1417463504000 ms (execution:
1.415 s)

14/12/01 11:51:45 ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting
receiver 0 - org.jboss.netty.channel.ChannelException: Failed to bind to: m1.ptang.aerohive.com/192.168.10.22:1234

at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:106)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:119)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:74)

at org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:68)

at org.apache.spark.streaming.flume.FlumeReceiver.initServer(FlumeInputDStream.scala:164)

at org.apache.spark.streaming.flume.FlumeReceiver.onStart(FlumeInputDStream.scala:171)

at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)

at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)

at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)

at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:257)

at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

at org.apache.spark.scheduler.Task.run(Task.scala:54)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.net.BindException: Cannot assign requested address

at sun.nio.ch.Net.bind0(Native Method)

at sun.nio.ch.Net.bind(Net.java:444)

at sun.nio.ch.Net.bind(Net.java:436)

at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)

at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)

at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)

at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)

at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)

... 3 more


14/12/01 11:51:45 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 72, m8.ptang.aerohive.com):
org.jboss.netty.channel.ChannelException: Failed to bind to: m1.ptang.aerohive.com/192.168.10.22:1234

        org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:106)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:119)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:74)

        org.apache.avro.ipc.NettyServer.<init>(NettyServer.java:68)

        org.apache.spark.streaming.flume.FlumeReceiver.initServer(FlumeInputDStream.scala:164)

        org.apache.spark.streaming.flume.FlumeReceiver.onStart(FlumeInputDStream.scala:171)

        org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:121)

        org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:106)

        org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:264)

        org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverLauncher$$anonfun$9.apply(ReceiverTracker.scala:257)

        org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

        org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)

        org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

        org.apache.spark.scheduler.Task.run(Task.scala:54)

        org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)

        java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        java.lang.Thread.run(Thread.java:745)

Thanks

Ping

From: Prannoy <prannoy@sigmoidanalytics.com<mailto:prannoy@sigmoidanalytics.com>>
Date: Friday, November 28, 2014 at 12:56 AM
To: "user@spark.incubator.apache.org<mailto:user@spark.incubator.apache.org>" <user@spark.incubator.apache.org<mailto:user@spark.incubator.apache.org>>
Subject: Re: How to use FlumeInputDStream in spark cluster?

Hi,

BindException comes when two processes are using the same port. In your spark configuration
just set ("spark.ui.port","xxxxx"),
to some other port. xxxxx can be any number say 12345. BindException will not break your job
in either case. Just to fix it change the port number.

Thanks.

On Fri, Nov 28, 2014 at 1:30 PM, pamtang [via Apache Spark User List] <[hidden email]</user/SendEmail.jtp?type=node&node=19999&i=0>>
wrote:
I'm seeing the same issue on CDH 5.2 with Spark 1.1. FlumeEventCount works fine on a Standalone
cluster but throw BindException on YARN mode. Is there a solution to this problem or FlumeInputDStream
will not be working in a cluster environment?

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p19997.html
To start a new topic under Apache Spark User List, email [hidden email]</user/SendEmail.jtp?type=node&node=19999&i=1>
To unsubscribe from Apache Spark User List, click here.
NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>


________________________________
View this message in context: Re: How to use FlumeInputDStream in spark cluster?<http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-FlumeInputDStream-in-spark-cluster-tp1604p19999.html>
Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>
at Nabble.com.

Mime
View raw message