spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhil Das <ak...@sigmoidanalytics.com>
Subject Re: Getting NPE when trying to do spark streaming with Twitter
Date Mon, 11 Apr 2016 09:06:21 GMT
Looks like a yarn issue to me, Can you try checking out this code?
https://github.com/akhld/sparkstreaming-twitter just git clone and do a sbt
run after configuring your credentials in the main file
<https://github.com/akhld/sparkstreaming-twitter/blob/master/src/main/scala-2.10/TwitterStreamMain.scala>
.

Thanks
Best Regards

On Mon, Apr 11, 2016 at 11:29 AM, krisgari <lagiboy@hotmail.com> wrote:

>  I am new to SparkStreaming, when tried to submit the Spark-Twitter
> streaming
> job, getting the following error:
> ---
> Lost task 0.0 in stage 0.0 (TID
> 0,sandbox.hortonworks.com):java.lang.NullPointerException
> at org.apache.spark.util.Utils$.decodeFileNameInURI(Utils.scala:340)
> at org.apache.spark.util.Utils$.fetchFile(Utils.scala:365)
> at
>
> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:404)
> at
>
> org.apache.spark.executor.Executor$$anonfun$org$apache$spark$executor$Executor$$updateDependencies$5.apply(Executor.scala:396)
> at
>
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
> at
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
> at
>
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
> at
> org.apache.spark.executor.Executor.org
> $apache$spark$executor$Executor$$updateDependencies(Executor.scala:396)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:192)
> at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745
> --
>
> Here is the code snippet:
> --
> val Array(consumerKey, consumerSecret, accessToken, accessTokenSecret) =
> args.take(4)
> val filters = args.takeRight(args.length - 4)
>
> System.setProperty("twitter4j.oauth.consumerKey", consumerKey)
> System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret)
> System.setProperty("twitter4j.oauth.accessToken", accessToken)
> System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret)
> val sparkConf = new SparkConf().setAppName("TwitterPopularTags")
> val ssc = new StreamingContext(sparkConf,  Seconds(2))
> val stream = TwitterUtils.createStream(ssc,None, filters)
> val hashTags = stream.flatMap(status => status.getText.split("
> ").filter(_.startsWith("#")))
> val topCounts60 = hashTags.map((_, 1)).reduceByKeyAndWindow(_ + _,
> Seconds(60))
>                  .map{case (topic, count) => (count, topic)}
>                  .transform(_.sortByKey(false))
> topCounts60.foreachRDD(rdd => {
>   val topList = rdd.take(10)
>   println("\nPopular topics in last 60 seconds (%s
> total):".format(rdd.count()))
>   topList.foreach{case (count, tag) => println("%s (%s tweets)".format(tag,
> count))}
> })
> ssc.start()
> ssc.awaitTermination()
>
> --
> command to submit the job
> --
> ./bin/spark-submit --class
> "org.apache.spark.examples.streaming.TwitterPopularTags"  --master
> yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m
> --executor-cores 1  --jars
>
> /home/spark/.sbt/0.13/staging/0ff3ad537358b61f617c/twitterstream/target/scala-2.10/twitterstream-project_2.10-1.0.jar,/home/spark/.ivy2/cache/org.apache.spark/spark-streaming-twitter_2.10/jars/spark-streaming-twitter_2.10-1.6.1.jar,/home/spark/.ivy2/cache/org.twitter4j/twitter4j-core/jars/twitter4j-core-4.0.4.jar,/home/spark/.ivy2/cache/org.twitter4j/twitter4j-stream/jars/twitter4j-stream-4.0.4.jar,/home/spark/.ivy2/cache/org.apache.spark/spark-streaming_2.10/jars/spark-streaming_2.10-1.6.1.jar,/home/spark/.ivy2/cache/org.apache.spark/spark-core_2.10/jars/spark-core_2.10-1.6.1.jar
> "sandbox.hortonworks.com:6667" xx xx xx xx
> --
>
> Any clue why I am getting this NPE?? Any help on how to debug this further?
>
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-NPE-when-trying-to-do-spark-streaming-with-Twitter-tp26737.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message