flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gyula Fora (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4193) Task manager JVM crashes while deploying cancelling jobs
Date Tue, 01 Nov 2016 08:33:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15624754#comment-15624754

Gyula Fora commented on FLINK-4193:

These issues usually happened inside the RocksDB.open(...) method during initialization of
the state backend. If you think that the refactoring can affect this then we might get lucky

We are running this in production applications and haven't ported them to 1.2 but in a week
or two I will start working on that.

> Task manager JVM crashes while deploying cancelling jobs
> --------------------------------------------------------
>                 Key: FLINK-4193
>                 URL: https://issues.apache.org/jira/browse/FLINK-4193
>             Project: Flink
>          Issue Type: Bug
>          Components: Streaming, TaskManager
>            Reporter: Gyula Fora
>            Priority: Critical
> We have observed several TM crashes while deploying larger stateful streaming jobs that
use the RocksDB state backend.
> As the JVMs crash the logs don't show anything but I have uploaded all the info I have
got from the standard output.
> This indicates some GC and possibly some RocksDB issues underneath but we could not really
figure out much more.
> GC segfault
> https://gist.github.com/gyfora/9e56d4a0d4fc285a8d838e1b281ae125
> Other crashes (maybe rocks related)
> https://gist.github.com/gyfora/525c67c747873f0ff2ff2ed1682efefa
> https://gist.github.com/gyfora/b93611fde87b1f2516eeaf6bfbe8d818
> The third link shows 2 issues that happened in parallel...

This message was sent by Atlassian JIRA

View raw message