flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Piotr Nowojski (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-8132) FlinkKafkaProducer011 can commit incorrect transaction during recovery
Date Wed, 22 Nov 2017 11:29:00 GMT
Piotr Nowojski created FLINK-8132:
-------------------------------------

             Summary: FlinkKafkaProducer011 can commit incorrect transaction during recovery
                 Key: FLINK-8132
                 URL: https://issues.apache.org/jira/browse/FLINK-8132
             Project: Flink
          Issue Type: Bug
          Components: Kafka Connector
            Reporter: Piotr Nowojski
            Assignee: Piotr Nowojski
            Priority: Blocker
             Fix For: 1.4.0


Faulty scenario with producer pool of 2.

1. started transaction 1 with producerA, written record 42
2. checkpoint 1 triggered, pre committing txn1, started txn2 with producerB, written record
43
3. checkpoint 1 completed, committing txn1, returning producerA to the pool
4. checkpoint 2 triggered , committing txn2, started txn3 with producerA, written record 44
5. crash....
6. recover to checkpoint 1, txn1 from producerA found to "pendingCommitTransactions", attempting
to recoverAndCommit(txn1)
7. unfortunately txn1 and txn3 from the same producers are identical from KafkaBroker perspective
and thus txn3 is being committed

result is that both records 42 and 44 are committed. 

Proposed solution is to postpone returning producers to the pool until we are sure that previous
checkpoint (for which given producer was used) will not be used for recovery (at least one
more checkpoint was completed).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message