samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felix GV <>
Subject How do you serve the data computed by Samza?
Date Fri, 27 Mar 2015 16:52:46 GMT
Hi Samza devs, users and enthusiasts,

I've kept an eye on the Samza project for a while and I think it's super cool! I hope it continues
to mature and expand as it seems very promising (:

One thing I've been wondering for a while is: how do people serve the data they computed on
Samza? More specifically:

  1.  How do you expose the output of Samza jobs to online applications that need low-latency
  2.  Are these online apps mostly internal (i.e.: analytics, dashboards, etc.) or public/user-facing?
  3.  What systems do you currently use (or plan to use in the short-term) to host the data
generated in Samza? HBase? Cassandra? MySQL? Druid? Others?
  4.  Are you satisfied or are you facing challenges in terms of the write throughput supported
by these storage/serving systems? What about read throughput?
  5.  Are there situations where you wish to re-process all historical data when making improvements
to your Samza job, which results in the need to re-ingest all of the Samza output into your
online serving system (as described in the Kappa Architecture<>)
? Is this easy breezy or painful? Do you need to throttle it lest your serving system will
fall over?
  6.  If there was a highly-optimized and reliable way of ingesting partitioned streams quickly
into your online serving system, would that help you leverage Samza more effectively?

Your insights would be much appreciated!

Thanks (:


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message