kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navneeth Krishnan <reachnavnee...@gmail.com>
Subject Re: Flink vs Kafka streams
Date Sat, 09 Nov 2019 06:42:52 GMT
Thanks Peter, even with ECS we have autoscaling enabled but the issue is
during autoscaling we need to stop the job and start with new
parallelism which creates a downtime.


On Fri, Nov 8, 2019 at 1:01 PM Peter Groesbeck <peter.groesbeck@gmail.com>

> We use EMR instead of ECS but if that’s an option for your team, you can
> configure auto scaling rules in your cloud formation so that your task/job
> load dynamically controls cluster sizing.
> Sent from my iPhone
> > On Nov 8, 2019, at 1:40 AM, Navneeth Krishnan <reachnavneeth2@gmail.com>
> wrote:
> >
> > Hello All,
> >
> > I have a streaming job running in production which is processing over 2
> > billion events per day and it does some heavy processing on each event.
> We
> > have been facing some challenges in managing flink in production like
> > scaling in and out, restarting the job with savepoint etc. Flink
> provides a
> > lot of features which seemed as an obvious choice at that time but now
> with
> > all the operational overhead we are thinking should we still use flink
> for
> > our stream processing requirements or choose kafka streams.
> >
> > We currently deploy flink on ECR. Bringing up a new cluster for another
> > stream job is too expensive but on the flip side running it on the same
> > cluster becomes difficult since there are no ways to say this job has to
> be
> > run on a dedicated server versus this can run on a shared instance. Also
> > savepoint point, cancel and submit a new job results in some downtime.
> The
> > most critical part being there is no shared state among all tasks sort
> of a
> > global state. We sort of achieve this today using an external redis cache
> > but that incurs cost as well.
> >
> > If we are moving to kafka streams, it makes our deployment life much
> > easier, each new stream job will be a microservice that can scale
> > independently. With global state it's much easier to share state without
> > using external cache. But the disadvantage is we have to rely on the
> > partitions for parallelism. Although this might initially sound easier,
> > when we need to scale much higher this will become a bottleneck.
> >
> > Do you guys have any suggestions on this? We need to decide which way to
> > move forward and any suggestions would be of much greater help.
> >
> > Thanks

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message