flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zentol <...@git.apache.org>
Subject [GitHub] flink issue #2332: [FLINK-2055] Implement Streaming HBaseSink
Date Fri, 23 Sep 2016 11:36:48 GMT
Github user zentol commented on the issue:

    https://github.com/apache/flink/pull/2332
  
    Don't you loose any guarantees regarding order of mutations the moment you use asynchronous
updates anyway?
    
    The WriteAheadSink should only be used if you want to deal with non-deterministic programs
or want to send data in atomic mini-batches and must rely on the order of elements. Otherwise
there are much simpler solutions.
    
    If you do idempotent updates the only thing you have to do is write the data into HBase,
and make sure that every update sent for a given checkpoint is acknowledged before it is regarded
as complete. If you don't acknowledge them you lose at-least-once guarantees. This scheme
does not provide exactly-once *delivery* guarantees, however at any given point in time the
table would be in a state as if the updates were only sent once. This is the same guarantee
that we provide for Cassandra.
    
    For non-idempotent updates the thing gets a lot more difficult.
    
    If you can fire an entire checkpoint as a single atomic batch you just won the lottery,
as you can use the above scheme and a small auxiliary table to track completed checkpoints
per sink subtask.
    
    if you can't do that you will have to use system-specific features/guarantees to engineer
a solution that provides exactly-once guarantees. Versioning, rollbacks, unique ID's; something
that either allows you to revert the table to a clean state or track precisely which updates
were applied and sent the remaining updates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message