trafficserver-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject trafficserver git commit: docs: add initial content to the performance tuning section
Date Tue, 16 Dec 2014 00:28:08 GMT
Repository: trafficserver
Updated Branches:
  refs/heads/master e869da69d -> 47eeaf34c

docs: add initial content to the performance tuning section


Branch: refs/heads/master
Commit: 47eeaf34cdba8dccf350f59339699aab50e5e814
Parents: e869da6
Author: Jon Sime <>
Authored: Mon Dec 15 09:48:29 2014 -0800
Committer: James Peach <>
Committed: Mon Dec 15 16:27:53 2014 -0800

 doc/admin/performance-tuning.en.rst | 278 ++++++++++++++++++++++++++++---
 1 file changed, 259 insertions(+), 19 deletions(-)
diff --git a/doc/admin/performance-tuning.en.rst b/doc/admin/performance-tuning.en.rst
index 5486e90..616f299 100644
--- a/doc/admin/performance-tuning.en.rst
+++ b/doc/admin/performance-tuning.en.rst
@@ -1,8 +1,3 @@
-.. _performance-tuning:
-Performance Tuning
 .. Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
@@ -20,33 +15,278 @@ Performance Tuning
    specific language governing permissions and limitations
    under the License.
+.. include:: common.defs
+.. _performance-tuning:
+Performance Tuning
+|ATS| in its default configuration should perform suitably for running the
+included regression test suite, but will need special attention to both its own
+configuration and the environment in which it runs to perform optimally for
+production usage.
+There are numerous options and strategies for tuning the performance of |TS|
+and we attempt to document as many of them as possible in the sections below.
+Because |TS| offers enough flexibility to be useful for many caching and
+proxying scenarios, which tuning strategies will be most effective for any
+given use case may differ, as well as the specific values for various
+configuration options.
 .. toctree::
    :maxdepth: 2
-Before you start
+Before You Start
-There is no single option to that will guarantee maximum performance of
-Apache Traffic Server in every use case. There are however numerous options
-that help tune its performance under different loads and in its - often
-vastly different - use cases.
+One of the most important aspects of any attempt to optimize the performance
+of a |TS| installation is the ability to measure that installation's
+performance; both prior to and after any changes are made. To that end, it is
+strongly recommended that you establish some means to monitor and record a
+variety of performance metrics: request and response speed, latency, and
+throughput; memory and CPU utilization; and storage I/O operations.
+Attempts to tune a system without being able to compare the impact of changes
+made will at best result in haphazard, *feel good* results that may end up
+having no real world impact on your customers' experiences, and at worst may
+even result in lower performance than before you started. Additionally, in the
+all too common situation of budget constraints, having proper measurements of
+existing performance will greatly ease the process of focusing on those
+individual components that, should they require hardware expenditures or larger
+investments of employee time, have the highest potential gains relative to
+their cost.
 Building Traffic Server
-A lot of speed can be gained or lost depending on the way ATS is built.
+While the default compilation settings for |TS| will produce a set of binaries
+capable of serving most caching and proxying needs, there are some build
+options worth considering in specific environments.
+.. TODO::
+   - any reasons why someone wouldn't want to just go with distro packages?
+     (other than "distro doesn't package versions i want")
+   - list relevant build options, impact each can potentially have
+Hardware Tuning
+As with any other server software, efficient allocation of hardware resources
+will have a significant impact on |TS| performance.
+CPU Selection
+|ATS| uses a hybrid event-driven engine and multi-threaded processing model for
+handling incoming requests. As such, it is highly scalable and makes efficient
+use of modern, multicore processor architectures.
+.. TODO::
+   any benchmarks showing relative req/s improvements between 1 core, 2 core,
+   N core? diminishing rate of return? can't be totally linear, but maybe it
+   doesn't realistically drop off within the currently available options (i.e.
+   the curve holds up pretty well all the way through current four socket xeon
+   8 core systems, so given a lack of monetary constraint, adding more cores
+   is a surefire performance improvement (up to the bandwidth limits), or does
+   it fall off earlier, or can any modern 4 core saturate a 10G network link
+   given fast enough disks?)
+Memory Allocation
+Though |TS| stores cached content within an on-disk host database, the entire
+:ref:`cache-directory` is always maintained in memory during server operation.
+Additionally, most operating systems will maintain disk caches within system
+memory. It is also possible, and commonly advisable, to maintain an in-memory
+cache of frequently accessed content.
+The memory footprint of the |TS| process is largely fixed at the time of server
+startup. Your |TS| systems will need at least enough memory to satisfy basic
+operating system requirements, as well as capacity for the cache directory, and
+any memory cache you wish to use. The default settings allocate roughly 10
+megabytes of RAM cache for every gigabyte of disk cache storage, though this
+setting can be adjusted manually in :file:`records.config` using the setting
+:ts:cv:`proxy.config.cache.ram_cache.size`. |TS| will, under the default
+configuration, adjust this automatically if your system does not have enough
+physical memory to accomodate the aforementioned target.
+Aside from the cost of physical memory, and necessary supporting hardware to
+make use of large amounts of RAM, there is little downside to increasing the
+memory allocation of your cache servers. You will see, however, no benefit from
+sizing your memory allocation larger than the sum of your content (and index
+Disk Storage
+Except in cases where your entire cache may fit into system memory, your cache
+nodes will eventually need to interact with their disks. While a more detailed
+discussion of storage stratification is covered in `Cache Partitioning`_ below,
+very briefly you may be able to realize gains in performance by separating
+more frequently accessed content onto faster disks (PCIe SSDs, for instance)
+while maintaining the bulk of your on-disk cache objects, which may not receive
+the same high volume of requests, on lower-cost mechanical drives.
+Operating System Tuning
+|ATS| is supported on a variety of operating systems, and as a result the tuning
+strategies available at the OS level will vary depending upon your chosen
+General Recommendations
+TCP Keep Alive
+TCP Congestion Control Settings
+Ephemeral and Reserved Ports
+Jumbo Frames
+.. TODO:: would they be useful/harmful/neutral for anything other than local forward/transparent
-Tuning the Machine
-Operating Systems Options
+OmniOS / illumos
-Optimal Use of Memory
+Mac OS X
+Traffic Server Tuning
-Tuning different Thread types
+|TS| itself, of course, has many options you may want to consider adjusting to
+achieve optimal performance in your environment. Many of these settings are
+recorded in :file:`records.config` and may be adjusted with the
+:option:`traffic_line -s` command line utility while the server is operating.
+CPU and Thread Optimization
+Thread Scaling
+By default, |TS| creates 1.5 threads per CPU core on the host system. This may
+be adjusted with the following settings in :file:`records.config`:
+* :ts:cv:`proxy.config.exec_thread.autoconfig`
+* :ts:cv:`proxy.config.exec_thread.autoconfig.scale`
+* :ts:cv:`proxy.config.exec_thread.limit`
+Thread Affinity
+On multi-socket servers, such as Intel architectures with NUMA, you can adjust
+the thread affinity configuration to take advantage of cache pipelines and
+faster memory access, as well as preventing possibly costly thread migrations
+across sockets. This is adjusted with :ts:cv:`proxy.config.exec_thread.affinity`
+in :file:`records.config`. ::
+    CONFIG proxy.config.exec_thread.affinity INT 1
+Thread Stack Size
+.. TODO::
+   is there ever a need to fiddle with this, outside of possibly custom developed plugins?
+Polling Timeout
+If you are experiencing unusually or unacceptably high CPU utilization during
+idle workloads, you may consider adjusting the polling timeout with
+    CONFIG INT 60
+Memory Optimization
+Disk Storage Optimization
+Cache Partitioning
+Network Tuning
+Error responses from origins are conistent and costly
+If error responses are costly for your origin server to generate, you may elect
+to have |TS| cache these responses for a period of time. The default behavior is
+to consider all of these responses to be uncacheable, which will lead to every
+client request to result in an origin request.
+This behavior is controlled by both enabling the feature via
+:ts:cv:`proxy.config.http.negative_caching_enabled` and setting the cache time
+(in seconds) with :ts:cv:`proxy.config.http.negative_caching_lifetime`. ::
+    CONFIG proxy.config.http.negative_caching_enabled INT 1
+    CONFIG proxy.config.http.negative_caching_lifetime INT 10
+SSL-Specific Options
+Thread Types
+Logging Configuration
+.. TODO::
+   binary vs. ascii output
+   multiple log formats (netscape+squid+custom vs. just custom)
+   overhead to log collation
+   using direct writes vs. syslog target
+Plugin Tuning
+Common Scenarios and Pitfalls
-Tuning Plugin Execution
+While environments vary widely and |TS| is useful in a great number of different
+situations, there are at least some recurring elements that may be used as
+shortcuts to identifying problem areas, or realizing easier performance gains.
+.. TODO::
+   - origins not sending proper expiration headers (can fix at the origin (preferable) or
use proxy.config.http.cache.heuristic_(min|max)_lifetime as hacky bandaids)
+   - cookies and http_auth prevent caching
+   - avoid thundering herd with read-while-writer (link to section in http-proxy-caching)

View raw message