flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Forager <nathan.forager...@gmail.com>
Subject fault tolerance model for stateful streams
Date Mon, 06 Jul 2015 07:44:37 GMT
hi there,

I noticed the 0.9 release announces exactly-once semantics for streams. I
looked at the user guide and the primary mechanism for recovery appears to
be checkpointing of user state. I have a few questions:

1. The default behavior is that checkpoints are kept in memory on the
JobManager. Am I correct in assuming that this does *not* guarantee failure
recovery or exactly-once semantics if the driver fails?

2 The alternative recommended approach is to checkpoint to HDFS. If I have
a program where I am doing something like aggregating counts over thousands
of keys... it doesn't seem tenable to save a huge number checkpoint files
to HDFS with acceptable latency.

Could you comment at all on the persistence model for exactly-once in
Flink. I am pretty confused because checkpointing to HDFS in this way seems
to have limitations around scalability and latency.

- nate

View raw message