spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Gummelt (JIRA)" <>
Subject [jira] [Commented] (SPARK-16379) Spark on mesos is broken due to race condition in Logging
Date Tue, 05 Jul 2016 18:56:11 GMT


Michael Gummelt commented on SPARK-16379:

I traced back the addition of the `synchronized` block, and it seems Matei added it a long
time ago.  I can't prove that the method is thread-safe, so I'd rather not remove the synchronization
block.  So we can either:

1) Remove the log statements (I'd like to keep them)
2) Revert the `lazy` commit
3) Introduce an explicit lock, and synchronize on that rather than `this`

2) is the "correct" thing to do, since it's the author's responsibility to not break existing
code, but I'm OK with 3) as well.  [~srowen] what do you think?

> Spark on mesos is broken due to race condition in Logging
> ---------------------------------------------------------
>                 Key: SPARK-16379
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.0.0
>            Reporter: Stavros Kontopoulos
>            Priority: Blocker
>         Attachments: out.txt
> This commit introduced a transient lazy log val:
> This has caused problems in the past:
> One commit before that everything works fine.
> I spotted that when my CI started to fail:
> You can easily verify it by installing mesos on your machine and try to connect with
spark shell from bin dir:
> ./spark-shell --master mesos://zk://localhost:2181/mesos --conf spark.executor.url=$(pwd)/../spark-2.0.0-SNAPSHOT-bin-test.tgz
> It gets stuck at the point where it tries to create the SparkContext.
> Logging gets stuck here:
> I0705 12:10:10.076617  9303 group.cpp:700] Trying to get '/mesos/json.info_0000000152'
in ZooKeeper
> I0705 12:10:10.076920  9304 detector.cpp:479] A new leading master (UPID=master@
is detected
> I0705 12:10:10.076956  9303 sched.cpp:326] New master detected at master@
> I0705 12:10:10.077057  9303 sched.cpp:336] No credentials provided. Attempting to register
without authentication
> I0705 12:10:10.090709  9301 sched.cpp:703] Framework registered with 13553f8b-f42c-4f20-88cd-16f1cc153ede-0001
> I verified it also by changing @transient lazy val log to def and it works as expected.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message