samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navina Ramesh <>
Subject Re: Regarding my use case to explore with Samza-
Date Wed, 02 Mar 2016 07:12:08 GMT
Hi Manohar,
On a side note regarding your use-case, I have a question.
After consuming the DML changes from the kafka topic, why do you have to
query back? Are you trying to decorate the event or perform some kind of

The point I am trying to make is that if you perform a remote lookup with
every event you consume, it's going to be hard to keep to "realtime" (then
again, realtime really depends on your SLA).

Instead, I would suggest that you have an adapter that periodically takes a
snapshot of the entire table and pushes it to another topic in Kafka (not
sure how hard it is going to be to write an adapter). This way, when you
job starts, it can partition and cache the entire data set in the Samza
task (by using RocksDb with changelog, as Jagadish suggested). Samza
provides a "bootstrap" stream option that is read during job-startup until
no more messages are available. You can configure your snapshot stream to
be a bootstrap stream, essentially.
Once your job is "bootstrapped", you can process events by looking up in
the local partitioned store rather than the remote store. Please note that
the DML change topic and data set need to be partitioned with the same key.
Otherwise, it won't work correctly.

Another alternative is make a remote call to fetch the data set and cache
it locally with rocksdb. This is much simpler to implement, however, it
depends on how you configure your cache and the job will be eventually
close to "realtime".

Hope my suggestions makes sense. Apologies, if I have misunderstood your

Feel free to ask any questions you may have.


On Tue, Mar 1, 2016 at 10:43 PM, Jagadish Venkatraman <> wrote:

> Please take a look at the hello-world example. You can implement your
> business logic in the process() callback.
> What kind of transformation are you doing? Are you doing a group by/count
> style aggregation to generate the report? If so, you could use the embedded
> rocksdb store in Samza and potentially batch your writes to the database.
> How many Qps do you process at peak? Do you expect to buffer any state per
> message? What's the ratio of input to output messages on average?
> There's nothing that stops you from using JDBC and Samza.
> On Tue, Mar 1, 2016 at 8:58 PM, Manohar Reddy <
>> wrote:
> > Hello Team,
> >
> > we are part of some service based company and trying to explore the
> > available real time streaming one of the first option we
> > are trying is Samza.
> > let me explain brief about my use case here:
> >
> > we are trying to build  real time reporting dashboard for e-learning
> > domain.
> > To build this dash board the input is if any
> > DML(inserts/updates/deletes)  into the source RDBMS,immediately some
> > adapter will publish to kafka with RDBMS table name and primary keys as
> > JSON format.
> > Now Samza has to consume the kafka event and query back to source RDBMS
> > table to get the whole data set of  RDBMS relation tables by using json
> > event information.
> > now do some transformation here as per business and load into
> > Target(Reporting DB) RDBMS.
> > more or less here we are handling with few  JDBC calls through Samza and
> > here every day data load is small I can say max 2Gb of data but we need
> > real time processing ecosystem in place.
> > that's it brief about my usecase,so team please provide your inputs how
> we
> > can approach with samza for this there any utility API
> > with Samza for JDBC calls.
> >
> > Thank you very much in Advance.
> >
> > ~~Manohar
> > ________________________________
> > Happiest Minds Disclaimer
> >
> > This message is for the sole use of the intended recipient(s) and may
> > contain confidential, proprietary or legally privileged information. Any
> > unauthorized review, use, disclosure or distribution is prohibited. If
> you
> > are not the original intended recipient of the message, please contact
> the
> > sender by reply email and destroy all copies of the original message.
> >
> > Happiest Minds Technologies <>
> >
> > ________________________________
> >
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University

Navina R.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message