spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Tustin <mtus...@handybook.com>
Subject Re: Should localProperties be inheritable? Should we change that or document it?
Date Fri, 15 Apr 2016 10:45:37 GMT
It would be a pleasure. That said, what do you think about adding the
non-inheritable feature? I think that would be a big win for everything
that doesn't specifically need Inheritability.

On Friday, April 15, 2016, Reynold Xin <rxin@databricks.com> wrote:

> I think this was added a long time ago by me in order to make certain
> things work for Shark (good old times ...). You are probably right that by
> now some apps depend on the fact that this is inheritable, and changing
> that could break them in weird ways.
>
> Do you mind documenting this, and also add a test case?
>
>
> On Wed, Apr 13, 2016 at 6:15 AM, Marcin Tustin <mtustin@handybook.com
> <javascript:_e(%7B%7D,'cvml','mtustin@handybook.com');>> wrote:
>
>> *Tl;dr: *SparkContext.setLocalProperty is implemented with
>> InheritableThreadLocal.
>> This has unexpected consequences, not least because the method
>> documentation doesn't say anything about it:
>>
>> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L605
>>
>> I'd like to propose that we do one of: (1) document explicitly that these
>> properties are inheritable; (2) stop them being inheritable; or (3)
>> introduce the option to set these in a non-inheritable way.
>>
>> *Motivation: *This started with me investigating a last vestige of the
>> leaking spark.sql.execution.id issue in Spark 1.5.2 (it's not
>> reproducible under controlled conditions, and given the many and excellent
>> fixes on this issue it's completely mysterious that this hangs around; the
>> bug itself is largely beside the point).
>>
>> The specific contribution that inheritable localProperties makes to this
>> problem is that if a localProperty like spark.sql.execution.id leaks
>> (i.e. remains set when it shouldn't) because those properties are inherited
>> by spawned threads, that pollution affects all subsequently spawned threads.
>>
>> This doesn't sound like a big deal - why would worker threads be spawning
>> other threads? It turns out that Java's ThreadPoolExecutor has worker
>> threads spawn other worker threads (it has no master dispatcher thread; the
>> workers themselves run all the housekeeping). JavaDoc here:
>> https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html
>> and source code here:
>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8u40-b25/java/util/concurrent/ThreadPoolExecutor.java#ThreadPoolExecutor
>>
>> Accordingly, if using Scala Futures and any kind of thread pool that
>> comes built-in with Java, it's impossible to avoid localproperties
>> propagating haphazardly to different threads. For localProperties
>> explicitly set by user code this isn't nice, and requires work arounds like
>> explicitly clearing known properties at the start of every future, or in a
>> beforeExecute hook on the threadpool. For leaky properties the work around
>> is pretty much the same: defensively clear them in the threadpool.
>>
>> *Options:*
>> (0) Do nothing at all. Unattractive, because documenting this would still
>> be better;
>> (1) Update the scaladoc to explicitly say that localProperties are
>> inherited by spawned threads and note that caution should be exercised with
>> thread pools.
>> (2) Switch to using ordinary, non-inheritable thread locals. I assume
>> this would break something for somebody, but if not, this would be my
>> preferred option. Also a very simple change to implement if no-one is
>> relying on property inheritance.
>> (3) Introduce a second localProperty facility which is not inherited.
>> This would not break any existing code, and should not be too hard to
>> implement. localProperties which need cleanup could be migrated to using
>> this non-inheritable facility, helping to limit the impact of failing to
>> clean up.
>> The way I envisage this working is that non-inheritable localProperties
>> would be checked first, then inheritable, then global properties.
>>
>> *Actions:*
>> I'm happy to do the coding and open such Jira tickets as desirable or
>> necessary. Before I do any of that, I'd like to know if there's any support
>> for this, and ideally secure a committer who can help shepherd this change
>> through.
>>
>> Marcin Tustin
>>
>> Want to work at Handy? Check out our culture deck and open roles
>> <http://www.handy.com/careers>
>> Latest news <http://www.handy.com/press> at Handy
>> Handy just raised $50m
>> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led
>> by Fidelity
>>
>>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/>
led 
by Fidelity


Mime
View raw message