cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
Date Tue, 13 Nov 2018 14:43:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16685312#comment-16685312
] 

Benedict edited comment on CASSANDRA-14554 at 11/13/18 2:42 PM:
----------------------------------------------------------------

I think the situation would actually be, essentially, undefined behaviour?  Since even
with this patch we've got a race for modifying the various collections between {{abort()}}
and {{trackNew()}}. Even if this window is narrow, this is probably not acceptable for the
fix?

We could perhaps create a synchronised transaction object that wraps the {{LifecycleTransaction}}
and exposes largely the API you have exposed, but also synchronises its prepare/commit/abort
phases? This would avoid the possibility of corrupting the internal state and producing undefined
behaviour.

But we're still at least broken on Windows platforms (unless their FS semantics have changed),
because we abort the shared transaction before we have closed our in-progress file handles,
and so we cannot delete all of the sstables. Not sure what state that leaves us in, but hopefully
it would be recovered on restart.

Ultimately, the child transaction approach still feels easier to reason about for me.  With
the right comments, I don't see why we should worry about people misusing it for concurrent
scenario - and we could only synchronise the internal methods (and only synchronise externally
the transfer method, for clarity).

On a bikeshedding note, I'm very unconvinced by the name {{SSTableTracker}}. It's generic,
and very close to {{Tracker}} which is a much more global thing. Perhaps {{LifecycleTransactionNewTracker}}
or something, to make clear its scope? The variables wouldn't need to be renamed, also. I
don't see us ever wanting to back this by something other than a {{LifecycleTransaction}}?

Probably we should have originally called {{LifecycleTransaction}} {{LifecycleTxn}}, or simply
{{SSTableTxn}}


was (Author: benedict):
I think the situation would actually be, essentially, undefined behaviour?  Since even
with this patch we've got a race for modifying the various collections between {{abort()}}
and {{trackNew()}}. Even if this window is narrow, this is probably not acceptable for the
fix?

We could perhaps create a synchronised transaction object that wraps the {{LifecycleTransaction}}
and exposes largely the API you have exposed, but also synchronises its prepare/commit/abort
phases? This would avoid the possibility of corrupting the internal state and producing undefined
behaviour.

But we're still at least broken on Windows platforms (unless their FS semantics have changed),
because we abort the shared transaction before we have closed our in-progress file handles,
and so we cannot delete all of the sstables. Not sure what state that leaves us in, but hopefully
it would be recovered on restart.

On a bikeshedding note, I'm very unconvinced by the name {{SSTableTracker}}. It's generic,
and very close to {{Tracker}} which is a much more global thing. Perhaps {{LifecycleTransactionNewTracker}}
or something, to make clear its scope? The variables wouldn't need to be renamed, also. I
don't see us ever wanting to back this by something other than a {{LifecycleTransaction}}?

Probably we should have originally called {{LifecycleTransaction}} {{LifecycleTxn}}, or simply
{{SSTableTxn}}

> LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded
context
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14554
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14554
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Dinesh Joshi
>            Assignee: Dinesh Joshi
>            Priority: Major
>
> When LifecycleTransaction is used in a multi-threaded context, we encounter this exception
-
> {quote}java.util.ConcurrentModificationException: null
>  at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
>  at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
>  at java.lang.Iterable.forEach(Iterable.java:74)
>  at org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78)
>  at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320)
>  at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285)
>  at org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136)
>  at org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529)
> {quote}
> During streaming we create a reference to a {{LifeCycleTransaction}} and share it between
threads -
> [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156]
> This is used in a multi-threaded context inside {{CassandraIncomingFile}} which is an {{IncomingStreamMessage}}.
This is being deserialized in parallel.
> {{LifecycleTransaction}} is not meant to be used in a multi-threaded context and this
leads to streaming failures due to object sharing. On trunk, this object is shared across
all threads that transfer sstables in parallel for the given {{TableId}} in a {{StreamSession}}.
There are two options to solve this - make {{LifecycleTransaction}} and the associated objects
thread safe, scope the transaction to a single {{CassandraIncomingFile}}. The consequences
of the latter option is that if we experience streaming failure we may have redundant SSTables
on disk. This is ok as compaction should clean this up. A third option is we synchronize access
in the streaming infrastructure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message