samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Song <>
Subject Samza processing reference data
Date Wed, 21 Oct 2015 20:44:41 GMT
In our samza app, we need to read data from MySQL (reference table) with a
stream. So the requirements are

* Read data into each Samza task before processing any message.
* The Samza task should be able to listen to updates happening in MySQL.

I did some research after scanning through some relevant conversations and
JIRAs on the community but did not find a solution yet. Neither I find a
recommended way to do this.

If my data streams comes from a topic called *topicD*, options in my mind

   - Use Kafka
      1. Use one of CDC based solution to replicate data in MySQL to a
      topic Kafka.
      Say the topic is called *topicR*.
      2. In my Samza app, read reference table from *topicR *and persisted
      in a cache in each Samza task's local storage.
         - If the data in *topicR *is NOT partitioned in the same way as
         *topicD*, can we configure each individual Samza task to read data
         from all partitions from a topic?
         - If the answer to the above question is no, do I need to
create *topicR
         *with the same number of partitions as *topicD*, and replicate
         data to all partitions?
         - On start, how to make Samza task to block processing the first
         message from *topicD* before reading all data from *topicR*.
      3. Any new updates/deletes to *topicR *will be consumed to update the
      local cache of each Samza task.
      4. On failure or restarts, each Samza task will read from the
      beginning from *topicR*.
   - Not Use Kafka
      - Each Samza task reads a Snapshot of database and builds its local
      cache, and it then needs to read periodically to update its
local cache. I
      have read about a few blogs, and this doesn't sound a solid way
in the long

Any thoughts?



Chen Song

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message