spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jahagirdar, Madhu" <madhu.jahagir...@philips.com>
Subject Issue with Spark Twitter Streaming
Date Mon, 13 Oct 2014 08:44:01 GMT
All,

We are using Spark Streaming to receive data from twitter stream.  This is running behind
proxy. We have done the following configurations inside spark steaming for twitter4j to work
behind proxy.

def main(args: Array[String]) {
    val filters =  Array("Modi")

    System.setProperty("twitter4j.oauth.consumerKey", "*")
    System.setProperty("twitter4j.oauth.consumerSecret", "*")
    System.setProperty("twitter4j.oauth.accessToken", "*")
    System.setProperty("twitter4j.oauth.accessTokenSecret", "*")
    System.setProperty("twitter4j.http.proxyHost", "X.X.X.X");
    System.setProperty("twitter4j.http.proxyPort", "XXXX");
    System.setProperty("twitter4j.http.useSSL", "true");

    val conf = new SparkConf().setAppName("TwitterPopularTags")

    val ssc = new StreamingContext(conf, Seconds(60))
    val stream = TwitterUtils.createStream(ssc, None, filters)

    stream.print()

    ssc.start()
    ssc.awaitTermination()
  }

spark-streaming-twitter_2.10-1.1.0
twitter4j-core-3.0.3.jar
twitter4j-stream-3.0.3.jar


When the spark job is run with local[2], running on a single node and not on cluster, with
the same settings above it is able to pull the data and it works like charm behind proxy.

The same code when run on a cluster (below) on the same network with the above settings it
is throwing the below error. Not sure what is going wrong. Any help is appreciated. We checked
that environment variables of executors, all the above system properties are set.

bin/spark-submit --class SparkTwitter2Kafka --master spark://IPADDRESS:7077 spark-twitter.jar

14/10/13 14:00:10 ERROR scheduler.ReceiverTracker: Deregistered receiver for stream 0: Restarting
receiver with delay 2000ms: Error receiving tweets - connect timed out
Relevant discussions can be found on the Internet at:
                http://www.google.co.jp/search?q=944a924a or
                http://www.google.co.jp/search?q=24fd66dc
TwitterException{exceptionCode=[944a924a-24fd66dc 944a924a-24fd66b2], statusCode=-1, message=null,
code=-1, retryAfter=-1, rateLimitStatus=null, version=3.0.5}
                at twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:177)
                at twitter4j.internal.http.HttpClientWrapper.request(HttpClientWrapper.java:61)
                at twitter4j.internal.http.HttpClientWrapper.post(HttpClientWrapper.java:98)
                at twitter4j.TwitterStreamImpl.getFilterStream(TwitterStreamImpl.java:304)
                at twitter4j.TwitterStreamImpl$7.getStream(TwitterStreamImpl.java:292)
                at twitter4j.TwitterStreamImpl$TwitterStreamConsumer.run(TwitterStreamImpl.java:462)
Caused by: java.net.SocketTimeoutException: connect timed out
                at java.net.PlainSocketImpl.socketConnect(Native Method)
                at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
                at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
                at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
                at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
                at java.net.Socket.connect(Socket.java:579)
                at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618)
                at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
                at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
                at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
                at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:275)
                at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:371)
                at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
                at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:932)
                at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
                at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1091)
                at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250)
                at twitter4j.internal.http.HttpClientImpl.request(HttpClientImpl.java:135)
                ... 5 more





________________________________
The information contained in this message may be confidential and legally protected under
applicable law. The message is intended solely for the addressee(s). If you are not the intended
recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction
of this message is strictly prohibited and may be unlawful. If you are not the intended recipient,
please contact the sender by return e-mail and destroy all copies of the original message.

Mime
View raw message