spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wenchen Fan (JIRA)" <>
Subject [jira] [Assigned] (SPARK-26851) CachedRDDBuilder only partially implements double-checked locking
Date Thu, 14 Feb 2019 06:59:00 GMT


Wenchen Fan reassigned SPARK-26851:

    Assignee: Bruce Robbins

> CachedRDDBuilder only partially implements double-checked locking
> -----------------------------------------------------------------
>                 Key: SPARK-26851
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.0, 3.0.0
>            Reporter: Bruce Robbins
>            Assignee: Bruce Robbins
>            Priority: Minor
> In CachedRDDBuilder, {{cachedColumnBuffers}} uses double-checked locking to lazily initialize
{{_cachedColumnBuffers}}. Also, clearCache uses double-checked locking to likely avoid synchronization
when {{_cachedColumnBuffers}} is still null.
> However, the resource (in this case, {{_cachedColumnBuffers}}) is not declared as volatile,
which could cause some visibility problems, particularly in {{clearCache}}, which may see
null reference when actually there is an RDD.
> From Java Concurrency in Practice by Brian Goetz et al:
> {quote}Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if
resource is made volatile, and the performance impact of this is small since volatile reads
are usually only slightly more expensive than nonvolatile reads.
> {quote}
> There are comments in other documentation that volatile is not needed if the resourceĀ is
immutable. While an RDD is immutable from a Spark user's point of view, it may not be from
a JVM's point of view, since not all internal fields are final.
> I've marked this as minor since the race conditions are highly unlikely.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message