spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pankaj channe <pankajc...@gmail.com>
Subject Re: Spark streaming job failing after some time.
Date Mon, 24 Nov 2014 18:51:17 GMT
I have figured out the problem here. Turned out that there was a problem
with my SparkConf when I was running my application with yarn in cluster
mode. I was setting my master to be local[4] inside my application, whereas
I was setting it to yarn-cluster with spark-submit. Now I have changed my
SparkConf in my application to not to hardcore master and it works.

The application was running for some time since yarn application master
attempts retry for maxNumTries and waits between each retry before it
completely fails. I was getting appropriate results from my streaming job
during this time.

Now, I can't figure out as to why it should run successfully during this
time even if it could not find SparkContext. I am sure there should be good
reason behind this behavior. Anyone has any idea on this?

Thanks,
Pankaj Channe


On Saturday, November 22, 2014, pankaj channe <pankajc007@gmail.com> wrote:

> Thanks Akhil for your input.
>
> I have already tried with 3 executors and it still results into the same
> problem. So as Sean mentioned, the problem does not seem to be related to
> that.
>
>
> On Sat, Nov 22, 2014 at 11:00 AM, Sean Owen <sowen@cloudera.com> wrote:
>
>> That doesn't seem to be the problem though. It processes but then stops.
>> Presumably there are many executors.
>> On Nov 22, 2014 9:40 AM, "Akhil Das" <akhil@sigmoidanalytics.com> wrote:
>>
>>> For Spark streaming, you must always set *--executor-cores* to a value
>>> which is >= 2. Or else it will not do any processing.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Sat, Nov 22, 2014 at 8:39 AM, pankaj channe <pankajc007@gmail.com>
>>> wrote:
>>>
>>>> I have seen similar posts on this issue but could not find solution.
>>>> Apologies if this has been discussed here before.
>>>>
>>>> I am running a spark streaming job with yarn on a 5 node cluster. I am
>>>> using following command to submit my streaming job.
>>>>
>>>> spark-submit --class class_name --master yarn-cluster --num-executors 1
>>>> --driver-memory 1g --executor-memory 1g --executor-cores 1 my_app.jar
>>>>
>>>>
>>>> After running for some time, the job stops. The application log shows
>>>> following two errors:
>>>>
>>>> 14/11/21 22:05:04 WARN yarn.ApplicationMaster: Unable to retrieve
>>>> SparkContext in spite of waiting for 100000, maxNumTries = 10
>>>> Exception in thread "main" java.lang.NullPointerException
>>>> at
>>>> org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkContextInitialized(ApplicationMaster.scala:218)
>>>> at
>>>> org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:107)
>>>> at
>>>> org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:410)
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
>>>> at
>>>> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
>>>> at
>>>> org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:409)
>>>> at
>>>> org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
>>>>
>>>>
>>>> and later...
>>>>
>>>> Failed to list files for dir:
>>>> /data2/hadoop/yarn/local/usercache/user_name/appcache/application_1416332002106_0009/spark-local-20141121220325-b529/20
>>>> at org.apache.spark.util.Utils$.listFilesSafely(Utils.scala:673)
>>>> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685)
>>>> at
>>>> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:686)
>>>> at
>>>> org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:685)
>>>> at
>>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>>>> at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>>>> at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:685)
>>>> at
>>>> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:181)
>>>> at
>>>> org.apache.spark.storage.DiskBlockManager$$anonfun$stop$1.apply(DiskBlockManager.scala:178)
>>>> at
>>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>>>> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>>>> at
>>>> org.apache.spark.storage.DiskBlockManager.stop(DiskBlockManager.scala:178)
>>>> at
>>>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply$mcV$sp(DiskBlockManager.scala:171)
>>>> at
>>>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169)
>>>> at
>>>> org.apache.spark.storage.DiskBlockManager$$anon$1$$anonfun$run$1.apply(DiskBlockManager.scala:169)
>>>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
>>>> at
>>>> org.apache.spark.storage.DiskBlockManager$$anon$1.run(DiskBlockManager.scala:169)
>>>>
>>>>
>>>> Note: I am building my jar on my local with spark dependency added in
>>>> pom.xml and running it on cluster running spark.
>>>>
>>>>
>>>> -Pankaj
>>>>
>>>
>>>
>

Mime
View raw message