spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: Problems after upgrading to spark 1.4.0
Date Mon, 13 Jul 2015 21:12:48 GMT
Spark 1.4.0 added shutdown hooks in the driver to cleanly shutdown the
Sparkcontext in the driver, which would shutdown the executors. I am not
sure whether this is related or not, but somehow the executor's shutdown
hook is being called.
Can you check the driver logs to see if driver's shutdown hook is
accidentally being called?


On Mon, Jul 13, 2015 at 9:23 AM, Luis Ángel Vicente Sánchez <
langel.groups@gmail.com> wrote:

> I forgot to mention that this is a long running job, actually a spark
> streaming job, and it's using mesos coarse mode. I'm still using the
> unreliable kafka receiver.
>
> 2015-07-13 17:15 GMT+01:00 Luis Ángel Vicente Sánchez <
> langel.groups@gmail.com>:
>
>> I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0
>> and after deploying it to mesos, it's not working anymore.
>>
>> The upgrade process was quite easy:
>>
>> - Create a new docker container for spark 1.4.0.
>> - Upgrade spark job to use spark 1.4.0 as a dependency and create a new
>> fatjar.
>> - Create a docker container for the jobs,  based on previous spark 1.4.0
>> container.
>>
>> After deploying it to marathon, the job only displays the driver under
>> executors and no task progresses. I haven't made any change to my config
>> files (apart for updating spark.executors.uri to point to the right file on
>> s3).
>>
>> If I go to mesos and I check my job under frameworks, I can see a few
>> failed stages; the content of stderr looks always like this:
>>
>> I0713 15:59:45.774368  1327 fetcher.cpp:214] Fetching URI 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
>> I0713 15:59:45.774483  1327 fetcher.cpp:125] Fetching URI 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
with os::net
>> I0713 15:59:45.774494  1327 fetcher.cpp:135] Downloading 'http://s3-eu-west-1.amazonaws.com/int-mesos-data/frameworks/spark/spark-1.4.0-bin-hadoop2.4.tgz'
to '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
>> I0713 15:59:50.700959  1327 fetcher.cpp:78] Extracted resource '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58/spark-1.4.0-bin-hadoop2.4.tgz'
into '/var/log/mcsvc/mesostmpdir/slaves/20150713-133618-421011372-5050-8867-S5/frameworks/20150713-152326-421011372-5050-12921-0002/executors/9/runs/9e44b2ea-c738-4e76-8103-3a85ce752b58'
>> I0713 15:59:50.973274  1333 exec.cpp:132] Version: 0.22.1
>> I0713 15:59:50.998219  1339 exec.cpp:206] Executor registered on slave 20150713-133618-421011372-5050-8867-S5
>> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
>> 15/07/13 15:59:51 INFO CoarseGrainedExecutorBackend: Registered signal handlers for
[TERM, HUP, INT]
>> 15/07/13 15:59:52 WARN NativeCodeLoader: Unable to load native-hadoop library for
your platform... using builtin-java classes where applicable
>> 15/07/13 15:59:52 INFO SecurityManager: Changing view acls to: root
>> 15/07/13 15:59:52 INFO SecurityManager: Changing modify acls to: root
>> 15/07/13 15:59:52 INFO SecurityManager: SecurityManager: authentication disabled;
ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
>> 15/07/13 15:59:52 INFO Slf4jLogger: Slf4jLogger started
>> 15/07/13 15:59:52 INFO Remoting: Starting remoting
>> 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://driverPropsFetcher@int-mesos-slave-ib4583253.mclabs.io:41854]
>> 15/07/13 15:59:53 INFO Utils: Successfully started service 'driverPropsFetcher' on
port 41854.
>> 15/07/13 15:59:53 INFO SecurityManager: Changing view acls to: root
>> 15/07/13 15:59:53 INFO SecurityManager: Changing modify acls to: root
>> 15/07/13 15:59:53 INFO SecurityManager: SecurityManager: authentication disabled;
ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
>> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote
daemon.
>> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut
down; proceeding with flushing remote transports.
>> 15/07/13 15:59:53 INFO Slf4jLogger: Slf4jLogger started
>> 15/07/13 15:59:53 INFO Remoting: Starting remoting
>> 15/07/13 15:59:53 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
>> 15/07/13 15:59:53 INFO Utils: Successfully started service 'sparkExecutor' on port
60219.
>> 15/07/13 15:59:53 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@int-mesos-slave-ib4583253.mclabs.io:60219]
>> 15/07/13 15:59:53 INFO DiskBlockManager: Created local directory at /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87
>> 15/07/13 15:59:53 INFO MemoryStore: MemoryStore started with capacity 267.5 MB
>> Exception in thread "main" java.io.FileNotFoundException: /etc/mindcandy/metrics.properties
(No such file or directory)
>> 	at java.io.FileInputStream.open0(Native Method)
>> 	at java.io.FileInputStream.open(FileInputStream.java:195)
>> 	at java.io.FileInputStream.<init>(FileInputStream.java:138)
>> 	at java.io.FileInputStream.<init>(FileInputStream.java:93)
>> 	at org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
>> 	at org.apache.spark.metrics.MetricsConfig$$anonfun$1.apply(MetricsConfig.scala:50)
>> 	at scala.Option.map(Option.scala:145)
>> 	at org.apache.spark.metrics.MetricsConfig.initialize(MetricsConfig.scala:50)
>> 	at org.apache.spark.metrics.MetricsSystem.<init>(MetricsSystem.scala:93)
>> 	at org.apache.spark.metrics.MetricsSystem$.createMetricsSystem(MetricsSystem.scala:222)
>> 	at org.apache.spark.SparkEnv$.create(SparkEnv.scala:367)
>> 	at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:211)
>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:180)
>> 	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:66)
>> 	at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:65)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:422)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>> 	at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:146)
>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:245)
>> 	at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>> 15/07/13 15:59:53 INFO DiskBlockManager: Shutdown hook called
>> 15/07/13 15:59:53 INFO Utils: path = /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439/blockmgr-4047306e-9dc8-48e4-bc25-300f4cf0be87,
already present as root for deletion.
>> 15/07/13 15:59:53 INFO Utils: Shutdown hook called
>> 15/07/13 15:59:53 INFO Utils: Deleting directory /var/log/mcsvc/sparktmpdir/spark-2ca9b3eb-ce70-44e5-9546-1a83f63dc439
>>
>>
>>
>>
>

Mime
View raw message