flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From greghogan <...@git.apache.org>
Subject [GitHub] flink pull request #2465: [FLINK-4447] [docs] Include NettyConfig options on...
Date Fri, 02 Sep 2016 15:18:05 GMT
Github user greghogan commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2465#discussion_r77363499
  
    --- Diff: docs/setup/config.md ---
    @@ -169,58 +169,111 @@ Default value is the `akka.ask.timeout`.
     These parameters configure the default HDFS used by Flink. Setups that do not specify
a HDFS configuration have to specify the full path to HDFS files (`hdfs://address:port/path/to/files`)
Files will also be written with default HDFS parameters (block size, replication factor).
     
     - `fs.hdfs.hadoopconf`: The absolute path to the Hadoop configuration directory. The
system will look for the "core-site.xml" and "hdfs-site.xml" files in that directory (DEFAULT:
null).
    +
     - `fs.hdfs.hdfsdefault`: The absolute path of Hadoop's own configuration file "hdfs-default.xml"
(DEFAULT: null).
    +
     - `fs.hdfs.hdfssite`: The absolute path of Hadoop's own configuration file "hdfs-site.xml"
(DEFAULT: null).
     
     ### JobManager &amp; TaskManager
     
     The following parameters configure Flink's JobManager and TaskManagers.
     
     - `jobmanager.rpc.address`: The IP address of the JobManager, which is the master/coordinator
of the distributed system (DEFAULT: **localhost**).
    +
     - `jobmanager.rpc.port`: The port number of the JobManager (DEFAULT: **6123**).
    +
     - `taskmanager.hostname`: The hostname of the network interface that the TaskManager
binds to. By default, the TaskManager searches for network interfaces that can connect to
the JobManager and other TaskManagers. This option can be used to define a hostname if that
strategy fails for some reason. Because different TaskManagers need different values for this
option, it usually is specified in an additional non-shared TaskManager-specific config file.
    +
     - `taskmanager.rpc.port`: The task manager's IPC port (DEFAULT: **0**, which lets the
OS choose a free port).
    +
     - `taskmanager.data.port`: The task manager's port used for data exchange operations
(DEFAULT: **0**, which lets the OS choose a free port).
    +
     - `jobmanager.heap.mb`: JVM heap size (in megabytes) for the JobManager (DEFAULT: **256**).
    +
     - `taskmanager.heap.mb`: JVM heap size (in megabytes) for the TaskManagers, which are
the parallel workers of the system. In contrast to Hadoop, Flink runs operators (e.g., join,
aggregate) and user-defined functions (e.g., Map, Reduce, CoGroup) inside the TaskManager
(including sorting/hashing/caching), so this value should be as large as possible (DEFAULT:
**512**). On YARN setups, this value is automatically configured to the size of the TaskManager's
YARN container, minus a certain tolerance value.
    +
     - `taskmanager.numberOfTaskSlots`: The number of parallel operator or user function instances
that a single TaskManager can run (DEFAULT: **1**). If this value is larger than 1, a single
TaskManager takes multiple instances of a function or operator. That way, the TaskManager
can utilize multiple CPU cores, but at the same time, the available memory is divided between
the different operator or function instances. This value is typically proportional to the
number of physical CPU cores that the TaskManager's machine has (e.g., equal to the number
of cores, or half the number of cores).
    +
     - `taskmanager.tmp.dirs`: The directory for temporary files, or a list of directories
separated by the systems directory delimiter (for example ':' (colon) on Linux/Unix). If multiple
directories are specified, then the temporary files will be distributed across the directories
in a round robin fashion. The I/O manager component will spawn one reading and one writing
thread per directory. A directory may be listed multiple times to have the I/O manager use
multiple threads for it (for example if it is physically stored on a very fast disc or RAID)
(DEFAULT: **The system's tmp dir**).
    +
     - `taskmanager.network.numberOfBuffers`: The number of buffers available to the network
stack. This number determines how many streaming data exchange channels a TaskManager can
have at the same time and how well buffered the channels are. If a job is rejected or you
get a warning that the system has not enough buffers available, increase this value (DEFAULT:
**2048**).
    +
     - `taskmanager.memory.size`: The amount of memory (in megabytes) that the task manager
reserves on the JVM's heap space for sorting, hash tables, and caching of intermediate results.
If unspecified (-1), the memory manager will take a fixed ratio of the heap memory available
to the JVM, as specified by `taskmanager.memory.fraction`. (DEFAULT: **-1**)
    +
     - `taskmanager.memory.fraction`: The relative amount of memory that the task manager
reserves for sorting, hash tables, and caching of intermediate results. For example, a value
of 0.8 means that TaskManagers reserve 80% of the JVM's heap space for internal data buffers,
leaving 20% of the JVM's heap space free for objects created by user-defined functions. (DEFAULT:
**0.7**) This parameter is only evaluated, if `taskmanager.memory.size` is not set.
    +
     - `taskmanager.debug.memory.startLogThread`: Causes the TaskManagers to periodically
log memory and Garbage collection statistics. The statistics include current heap-, off-heap,
and other memory pool utilization, as well as the time spent on garbage collection, by heap
memory pool.
    +
     - `taskmanager.debug.memory.logIntervalMs`: The interval (in milliseconds) in which the
TaskManagers log the memory and garbage collection statistics. Only has an effect, if `taskmanager.debug.memory.startLogThread`
is set to true.
    +
     - `blob.fetch.retries`: The number of retries for the TaskManager to download BLOBs (such
as JAR files) from the JobManager (DEFAULT: **50**).
    +
     - `blob.fetch.num-concurrent`: The number concurrent BLOB fetches (such as JAR file downloads)
that the JobManager serves (DEFAULT: **50**).
    +
     - `blob.fetch.backlog`: The maximum number of queued BLOB fetches (such as JAR file downloads)
that the JobManager allows (DEFAULT: **1000**).
    -- `task.cancellation-interval`: Time interval between two successive task cancellation
attempts in milliseconds (DEFAULT: **30000**).
     
    +- `task.cancellation-interval`: Time interval between two successive task cancellation
attempts in milliseconds (DEFAULT: **30000**).
     
     ### Distributed Coordination (via Akka)
     
     - `akka.ask.timeout`: Timeout used for all futures and blocking Akka calls. If Flink
fails due to timeouts then you should try to increase this value. Timeouts can be caused by
slow machines or a congested network. The timeout value requires a time-unit specifier (ms/s/min/h/d)
(DEFAULT: **10 s**).
    +
     - `akka.lookup.timeout`: Timeout used for the lookup of the JobManager. The timeout value
has to contain a time-unit specifier (ms/s/min/h/d) (DEFAULT: **10 s**).
    +
     - `akka.framesize`: Maximum size of messages which are sent between the JobManager and
the TaskManagers. If Flink fails because messages exceed this limit, then you should increase
it. The message size requires a size-unit specifier (DEFAULT: **10485760b**).
    +
     - `akka.watch.heartbeat.interval`: Heartbeat interval for Akka's DeathWatch mechanism
to detect dead TaskManagers. If TaskManagers are wrongly marked dead because of lost or delayed
heartbeat messages, then you should increase this value. A thorough description of Akka's
DeathWatch can be found [here](http://doc.akka.io/docs/akka/snapshot/scala/remoting.html#failure-detector)
(DEFAULT: **akka.ask.timeout/10**).
    +
     - `akka.watch.heartbeat.pause`: Acceptable heartbeat pause for Akka's DeathWatch mechanism.
A low value does not allow a irregular heartbeat. A thorough description of Akka's DeathWatch
can be found [here](http://doc.akka.io/docs/akka/snapshot/scala/remoting.html#failure-detector)
(DEFAULT: **akka.ask.timeout**).
    +
     - `akka.watch.threshold`: Threshold for the DeathWatch failure detector. A low value
is prone to false positives whereas a high value increases the time to detect a dead TaskManager.
A thorough description of Akka's DeathWatch can be found [here](http://doc.akka.io/docs/akka/snapshot/scala/remoting.html#failure-detector)
(DEFAULT: **12**).
    +
     - `akka.transport.heartbeat.interval`: Heartbeat interval for Akka's transport failure
detector. Since Flink uses TCP, the detector is not necessary. Therefore, the detector is
disabled by setting the interval to a very high value. In case you should need the transport
failure detector, set the interval to some reasonable value. The interval value requires a
time-unit specifier (ms/s/min/h/d) (DEFAULT: **1000 s**).
    +
     - `akka.transport.heartbeat.pause`: Acceptable heartbeat pause for Akka's transport failure
detector. Since Flink uses TCP, the detector is not necessary. Therefore, the detector is
disabled by setting the pause to a very high value. In case you should need the transport
failure detector, set the pause to some reasonable value. The pause value requires a time-unit
specifier (ms/s/min/h/d) (DEFAULT: **6000 s**).
    +
     - `akka.transport.threshold`: Threshold for the transport failure detector. Since Flink
uses TCP, the detector is not necessary and, thus, the threshold is set to a high value (DEFAULT:
**300**).
    +
     - `akka.tcp.timeout`: Timeout for all outbound connections. If you should experience
problems with connecting to a TaskManager due to a slow network, you should increase this
value (DEFAULT: **akka.ask.timeout**).
    +
     - `akka.throughput`: Number of messages that are processed in a batch before returning
the thread to the pool. Low values denote a fair scheduling whereas high values can increase
the performance at the cost of unfairness (DEFAULT: **15**).
    +
     - `akka.log.lifecycle.events`: Turns on the Akka's remote logging of events. Set this
value to 'true' in case of debugging (DEFAULT: **false**).
    +
     - `akka.startup-timeout`: Timeout after which the startup of a remote component is considered
being failed (DEFAULT: **akka.ask.timeout**).
     
    +### Network communication (via Netty)
    +
    +- `taskmanager.net.num-arenas`: The number of Netty arenas (DEFAULT: **taskmanager.numberOfTaskSlots**).
    +
    +- `taskmanager.net.server.numThreads`: The number of Netty server threads (DEFAULT: **taskmanager.numberOfTaskSlots**).
    +
    +- `taskmanager.net.client.numThreads`: The number of Netty client threads (DEFAULT: **taskmanager.numberOfTaskSlots**).
    +
    +- `taskmanager.net.server.backlog`: The netty server connection backlog.
    +
    +- `taskmanager.net.client.connectTimeoutSec`: The Netty client connection timeout (DEFAULT:
**120 seconds**).
    +
    +- `taskmanager.net.sendReceiveBufferSize`: The Netty send and receive buffer size.
    --- End diff --
    
    I think this is the Netty socket buffer size. Would it be worthwhile to note that recent
Linux defaults to 4 MiB?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message