Hello,

I am using TrafficServer which has only one origin - haproxy - and which serves as a cache for many sites.

Details are:

Version of Traffic Server used: 7.1.12 (also applies to 8.1.0)
Platform: Linux 64 bit, gcc 8.3.0
Any relevant configuration changes you've made from the default configurations (particularly for records.config), part of `traffic_ctl diff`:
proxy.config.http.cache.open_write_fail_action has changed
        Current Value   : 2
        Default Value   : 0
proxy.config.http.negative_revalidating_enabled has changed
        Current Value   : 1
        Default Value   : 0
proxy.config.http.negative_revalidating_lifetime has changed
        Current Value   : 86400
        Default Value   : 1800
proxy.config.http.insert_client_ip has changed
        Current Value   : 0
        Default Value   : 1
proxy.config.http.insert_squid_x_forwarded_for has changed
        Current Value   : 0
        Default Value   : 1
proxy.config.http.transaction_no_activity_timeout_in has changed
        Current Value   : 600
        Default Value   : 30
proxy.config.http.transaction_no_activity_timeout_out has changed
        Current Value   : 600
        Default Value   : 30
proxy.config.http.connect_attempts_max_retries has changed
        Current Value   : 0
        Default Value   : 3
proxy.config.http.connect_attempts_max_retries_dead_server has changed
        Current Value   : 0
        Default Value   : 1
proxy.config.http.connect_attempts_timeout has changed
        Current Value   : 600
        Default Value   : 30
proxy.config.http.post_connect_attempts_timeout has changed
        Current Value   : 600
        Default Value   : 1800
proxy.config.http.normalize_ae_gzip has changed
        Current Value   : 0
        Default Value   : 1
proxy.config.cache.ram_cache.size has changed
        Current Value   : 1073741824
        Default Value   : -1
proxy.config.url_remap.pristine_host_hdr has changed
        Current Value   : 1
        Default Value   : 0

I have trafficserver connecting to origin (haproxy) on the same host with:

cat etc/trafficserver/remap.config
map /HTTPS/ http://10.0.251.170:21443
map / http://10.0.251.170:41080

There is constant stream of requests going to trafficserver. About 0.5% of them fails with returning 502 to the client, with entry in var/log/trafficserver/error.log:

20210107.13h02m15s CONNECT: could not connect to 10.0.251.170 for 'http://10.0.251.170:41080/path/' (setting last failure time)
20210107.13h02m15s RESPONSE: sent 10.0.251.170 status 502 (Server Hangup) for 'http://10.0.251.170:41080/path/'

also seen in var/log/trafficserver/squid.log:

1610020935.756 43 10.0.251.170 TCP_REFRESH_FAIL_HIT/502 498 GET http://10.0.251.170:41080/path/ - DIRECT/10.0.251.170 text/html

Thanks to enabling diagnostics with:


CONFIG proxy.config.diags.debug.enabled INT 1
CONFIG proxy.config.diags.debug.tags STRING http.*

I saw in var/log/trafficserver/traffic.out:

[Jan  7 13:02:15.757] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:2666 (main_handler)> (http) [5687] [HttpSM::main_handler, VC_EVENT_WRITE_COMPLETE]
[Jan  7 13:02:15.757] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:1994 (state_send_server_request_header)> (http) [5687] [&HttpSM::state_send_server_request_header, VC_
EVENT_WRITE_COMPLETE]
[Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:2666 (main_handler)> (http) [5687] [HttpSM::main_handler, VC_EVENT_EOS]
[Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:1836 (state_read_server_response_header)> (http) [5687] [&HttpSM::state_read_server_response_header, V
C_EVENT_EOS]
[Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:1923 (state_read_server_response_header)> (http_seq) Error parsing server response header
[Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpSM.cc:5513 (handle_server_setup_error)> (http) [5687] [&HttpSM::handle_server_setup_error, VC_EVENT_EOS]
[Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:3394 (HandleResponse)> (http_trans) [5687] [HttpTransact::HandleResponse]
[Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:3395 (HandleResponse)> (http_seq) [5687] [HttpTransact::HandleResponse] Response received
[Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:8497 (ink_cluster_time)> (http_trans) [ink_cluster_time] local: 1610020935, highest_delta: 0, cl
uster: 1610020935
[Jan  7 13:02:15.795] Server {0x1462a05c1700} DEBUG: <HttpTransact.cc:3402 (HandleResponse)> (http_trans) [5687] [HandleResponse] response_received_time: 1610020935
+++++++++ Incoming O.S. Response +++++++++
-- State Machine Id: 5687
HTTP/1.0 0

I clearly see that setting

proxy.config.http.send_http11_requests INT 0

Drops the amount of problems to almost 0 (but they still appear).

As a workaround I used (with keeping HTTP/1.1):

CONFIG proxy.config.http.connect_attempts_max_retries INT 3
CONFIG proxy.config.http.connect_attempts_max_retries_dead_server INT 1

Then my client is never served with 502 in such case, and with TrafficServer 7 there is nothing in the var/log/trafficserver/error.log, while using TrafficServer 8 there is:

20210111.14h30m34s CONNECT:[0] could not connect [CONNECTION_CLOSED] to 10.0.251.170 for 'http://10.0.251.170:41080/'

And well, TrafficServer just reconnects to the origin.

My questions are:

 * Is it a possible bug in TrafficServer (somewhat similar to https://issues.apache.org/jira/browse/TS-3959)?
 * Is it misconfiguration of TrafficServer?
 * Is there all ok with TrafficServer and there is really a problem in my origin software?

Thanks for the tips,
Regards,
Łukasz