kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philip Schmitt <philip.schm...@outlook.com>
Subject Reliably producing records to remote cluster: what are my options?
Date Tue, 12 Sep 2017 19:19:41 GMT

We want to reliably produce events into a remote Kafka cluster in (mostly) near real-time.
We have to provide an at-least-once guarantee.

Examples are a "Customer logged in" event, that will be consumed by a data warehouse for reporting
(numbers should be correct) or a "Customer unsubscribed from newsletter" event, that determines
whether the customer gets emails (if she unsubscribes, but the message is lost, she will not
be happy).


  *   We run an ecommerce website on a cluster of up to ten servers and an Oracle database.
  *   We have a small Kafka cluster at a different site. We have in the past had a small number
of network issues, where the web servers could not reach the other site for maybe an hour.
  *   We don't persist all events in the database. If the application is restarted, events
that occurred before the restart cannot be sent to Kafka. The row of a customer might have
a newer timestamp, but we couldn't tell which columns were changed.


  *   In case of, for example, a network outage between the web servers and the Kafka cluster,
we may accumulate thousands of events on each web server that cannot be sent to Kafka. If
a server is shut down during that time, the messages would be lost.
  *   If we produce to Kafka from within the application in addition to writing to the database,
the data may become inconsistent if one of the writes fails.

The more I read about Kafka, the more options I see, but I cannot assess, how well the options
might work and what the trade-offs between the options are.

  1.  produce records directly within the application
  2.  produce records from the Oracle database via Kafka Connect
  3.  produce records from the Oracle database via a CDC solution (GoldenGate, Attunity, Striim,
  4.  persist events in log files and produce to Kafka via elastic Logstash/Filebeat
  5.  persist events in log files and produce to Kafka via a Kafka Connect source connector
  6.  persist events in a local, embedded database and produce to Kafka via an existing source
  7.  produce records directly within the application to a new Kafka cluster in the same network
and mirror to remote cluster
  8.  ?

These are all the options I could gather so far. Some of the options probably won't work for
my situation -- for example Oracle Golden Gate might be too expensive -- but I don't want
to rule anything out just yet.

How would you approach this, and why? Which options might work? Which options would you advise

I appreciate any advice. Thank you in advance.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message