flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Richter (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-9831) Too many open files for RocksDB
Date Thu, 12 Jul 2018 13:21:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-9831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541621#comment-16541621
] 

Stefan Richter edited comment on FLINK-9831 at 7/12/18 1:20 PM:
----------------------------------------------------------------

Defaults are in {{org.apache.flink.contrib.streaming.state.PredefinedOptions}} and you can
change them by implementing an own {{org.apache.flink.contrib.streaming.stateOptionsFactory}}.
See {{RocksDBStateBackendConfigTest}} for examples. Everything that is not modified is using
the defaults from RocksDB.


was (Author: srichter):
Defaults are in {{org.apache.flink.contrib.streaming.state.PredefinedOptions}} and you can
change them by implementing an own {{org.apache.flink.contrib.streaming.stateOptionsFactory}}.
See {{RocksDBStateBackendConfigTest}} for examples.

> Too many open files for RocksDB
> -------------------------------
>
>                 Key: FLINK-9831
>                 URL: https://issues.apache.org/jira/browse/FLINK-9831
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.5.0
>            Reporter: Sayat Satybaldiyev
>            Priority: Major
>         Attachments: flink_open_files.txt
>
>
> While running only one Flink job, which is backed by RocksDB with checkpoining to HDFS
we encounter an exception that TM cannot access the SST file because the process has too
many open files. However, we have already increased the file soft/hard limit on the machine.
> Number open files for TM on the machine:
>  
> {code:java}
> lsof -p 23301|wc -l
> 8241{code}
>  
> Instance limits
>  
> {code:java}
> ulimit -a
> core file size (blocks, -c) 0
> data seg size (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size (blocks, -f) unlimited
> pending signals (-i) 256726
> max locked memory (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> open files (-n) 1048576
> pipe size (512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority (-r) 0
> stack size (kbytes, -s) 8192
> cpu time (seconds, -t) unlimited
> max user processes (-u) 128000
> virtual memory (kbytes, -v) unlimited
> file locks (-x) unlimited
>  
> {code}
>  
> [^flink_open_files.txt]
> java.lang.Exception: Exception while creating StreamOperatorStateContext.
> 	at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:191)
> 	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:227)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:730)
> 	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:295)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state backend
for KeyedCoProcessOperator_98a16ed3228ec4a08acd8d78420516a1_(1/1) from any of the 1 provided
restore options.
> 	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
> 	at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:276)
> 	at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:132)
> 	... 5 more
> Caused by: java.io.FileNotFoundException: /tmp/flink-io-3da06c9e-f619-44c9-b95f-54ee9b1a084a/job_b3ecbdc0eb2dc2dfbf5532ec1fcef9da_op_KeyedCoProcessOperator_98a16ed3228ec4a08acd8d78420516a1__1_1__uuid_c4b82a7e-8a04-4704-9e0b-393c3243cef2/3701639a-bacd-4861-99d8-5f3d112e88d6/000016.sst
(Too many open files)
> 	at java.io.FileOutputStream.open0(Native Method)
> 	at java.io.FileOutputStream.open(FileOutputStream.java:270)
> 	at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
> 	at java.io.FileOutputStream.<init>(FileOutputStream.java:162)
> 	at org.apache.flink.core.fs.local.LocalDataOutputStream.<init>(LocalDataOutputStream.java:47)
> 	at org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:275)
> 	at org.apache.flink.core.fs.SafetyNetWrapperFileSystem.create(SafetyNetWrapperFileSystem.java:121)
> 	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.copyStateDataHandleData(RocksDBKeyedStateBackend.java:1008)
> 	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.transferAllDataFromStateHandles(RocksDBKeyedStateBackend.java:988)
> 	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.transferAllStateDataToDirectory(RocksDBKeyedStateBackend.java:973)
> 	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restoreInstance(RocksDBKeyedStateBackend.java:758)
> 	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalRestoreOperation.restore(RocksDBKeyedStateBackend.java:732)
> 	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:443)
> 	at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.restore(RocksDBKeyedStateBackend.java:149)
> 	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
> 	at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
> 	... 7 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message