cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14554) LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded context
Date Tue, 13 Nov 2018 02:04:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16684618#comment-16684618
] 

Stefania commented on CASSANDRA-14554:
--------------------------------------

{quote}The only reason would be simplifying analysis of the code's behaviour. For instance,
it's not clear to me how we either would (or should) behave in the stream writers actively
working (and creating sstable files) but for whom the transaction has already been cancelled.
Does such a scenario even arise? Is it possible it would leave partially written sstables?
{quote}
I'm not sure if this scenario may arise when a streaming transaction is aborted, it depends
on streaming details which I've forgotten, but let's step through it:
 - The new sstables are recorded as new records before the files are created. If the recording
fails, because the transaction was aborted, the streamer will abort with an exception. Fine.
 - So long as the sstables are recorded, the transaction tidier will delete the files on disk
and so the contents will be removed from disk as soon as the streamer finishes writing. Also
fine.
 - We may however have a race is if the streamer has added a new record to a txn that is about
to be aborted, and the streamer hasn't created sstable files when the transaction tidier is
running. This could leave files on disk. It's an extremely small window, but it's not impossible.

We keep a reference to the txn only for obsoleted readers of existing files, we should also
keep a reference to the txn until all new files are at least created and the directory has
been synced. Child transactions would solve this without the need for this extra reference,
but we would need to enforce them for all multi-threaded code (the presence of synchronized
methods may lure people on sharing transactions). The alternative to child transactions is
to force writers to reference the txn.
{quote}we could even do it with a delegating SynchronizedLifecycleTransaction, which would
seem to be equivalent to your patch
{quote}
This was exactly the starting point of my patch. I did not implement a fully synchronized
transaction because the API is quite large. I thought it may need some cleanup in order to
extract the methods related to the transaction behavior. I did not have the time to look into
this, and also cleaning up the API is not an option on out released branches, due to the risk
of introducing problems, so I just extracted the three methods that are used by the writers
and implemented the safest approach.

> LifecycleTransaction encounters ConcurrentModificationException when used in multi-threaded
context
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14554
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14554
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Dinesh Joshi
>            Assignee: Dinesh Joshi
>            Priority: Major
>
> When LifecycleTransaction is used in a multi-threaded context, we encounter this exception
-
> {quote}java.util.ConcurrentModificationException: null
>  at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
>  at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
>  at java.lang.Iterable.forEach(Iterable.java:74)
>  at org.apache.cassandra.db.lifecycle.LogReplicaSet.maybeCreateReplica(LogReplicaSet.java:78)
>  at org.apache.cassandra.db.lifecycle.LogFile.makeRecord(LogFile.java:320)
>  at org.apache.cassandra.db.lifecycle.LogFile.add(LogFile.java:285)
>  at org.apache.cassandra.db.lifecycle.LogTransaction.trackNew(LogTransaction.java:136)
>  at org.apache.cassandra.db.lifecycle.LifecycleTransaction.trackNew(LifecycleTransaction.java:529)
> {quote}
> During streaming we create a reference to a {{LifeCycleTransaction}} and share it between
threads -
> [https://github.com/apache/cassandra/blob/5cc68a87359dd02412bdb70a52dfcd718d44a5ba/src/java/org/apache/cassandra/db/streaming/CassandraStreamReader.java#L156]
> This is used in a multi-threaded context insideĀ {{CassandraIncomingFile}} which is anĀ {{IncomingStreamMessage}}.
This is being deserialized in parallel.
> {{LifecycleTransaction}} is not meant to be used in a multi-threaded context and this
leads to streaming failures due to object sharing. On trunk, this object is shared across
all threads that transfer sstables in parallel for the given {{TableId}} in a {{StreamSession}}.
There are two options to solve this - make {{LifecycleTransaction}} and the associated objects
thread safe, scope the transaction to a single {{CassandraIncomingFile}}. The consequences
of the latter option is that if we experience streaming failure we may have redundant SSTables
on disk. This is ok as compaction should clean this up. A third option is we synchronize access
in the streaming infrastructure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message