trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Payne <jp557...@gmail.com>
Subject Re: [EXTERNAL] Issue with collapsed forwarding plugin, one CONN_FAIL
Date Tue, 19 Jun 2018 18:35:06 GMT
Yes.. Thanks for that reminder.. The point of the plugin is to return
the transaction to open read, since once the write lock is obtained
the request is heading to the origin.

That said, I'm still noticing a timing issue with the plugin..
Meaning, if the plugin is kicked off too soon, then for whatever the
reason the plugin returns a 500 Ink-API response.
If I add a little delay via open_write_retries, then the plugin
returns the expected response.
Its a dark art of sorts.. :-)



On Tue, Jun 19, 2018 at 11:14 AM, evan.zelkowitz@gmail.com
<evan.zelkowitz@gmail.com> wrote:
> So the setting we usually use on write fail is to go to the origin. The other option
is to send back 5xx or stale. In the origin case each request will end up waiting for the
write lock and then once it obtains it it will end up going to the origin. So you end up with
a lockstep herd all writing the same content (at least for all the requests that come in before
the initial write finishes).
>
> The idea of the plugin is instead when you get a write lock fail you know at that point
that some other request is already attempting to write/update the object, so at that point
you may as well just go in to a read retry loop instead of actually attempting any write since
at some point the first write will finish and you will be able to read that without having
to go to the origin. I believe that was the initial intent.  However in my testing I also
saw the redirect and 5xx client leaks, so it was too unpredictable for us to use it.
>
> I have been working on trying to add this functionality in to the ATS core code, just
another write fail option that if the write lock fails it jumps back to attempting to read
from the cache instead. Its mostly working except for one random core Im getting in production
testing that Im trying to track down.  Its also just experimental at this stage since because
of how the ATS state machine is setup its possible that even after jumping back to the read
its possible to end up attempting another write and if the timing is just right the first
write may finish right as another request failed to read and attempted to do another write,
which will result in an origin hit. But thats something to figure out once Ive worked out
the core issue
>
> On 2018/06/19 16:00:47, Jeremy Payne <jp557198@gmail.com> wrote:
>> Evan,
>>
>> In my lab testing, I've noticed inconsistent results with the
>> collapsed_forwarding plugin.  Sometimes it leaks redirects to the
>> client. Sometimes it sends a 500-Ink_Api response back.
>> However, when increasing open_write retires to some value to allow
>> upstream to respond, my results have been more consistent.. Where
>> instead of error responses being sent back downstream, I
>> am now sending 200s(or something other than a 5xx).
>>
>> proxy.config.http.cache.max_open_write_retries 50
>>
>> So whats the difference with core waiting for the write lock to free
>> up, as opposed to handing off the same to the plugin ?
>>
>>
>>
>>
>>
>>
>> On Tue, Jun 19, 2018 at 10:07 AM, evan.zelkowitz@gmail.com
>> <evan.zelkowitz@gmail.com> wrote:
>> > I believe the ats params provide good enough protection except for the one specific
case mentioned in the collapsed forwarding plugin docs. The one case they can't protect against
is multiple instantaneous hits for the same content. So this actually comes up fairly often
for linear video since you have multiple clients that can all be requesting the same video
chunk at the same time
>> >
>> > On 2018/06/19 12:24:08, Jeremy Payne <jp557198@gmail.com> wrote:
>> >> in testing 713, i also notice the 303 sent back to the end user. this
>> >> is much different than what's seen with 621. at least with 621 a 5xx
>> >> reponse is sent  after all configured retries have been exhausted.
>> >>
>> >> that said.. whats the current status of this plugin? deprecated? no
>> >> longer supported ?
>> >>
>> >> is it safe to say, ATS core read_while_writer timers(and related
>> >> parameters) should be the preferred protections against thundering
>> >> herd ?
>> >> maybe at one time, core parameters were not sufficient to protect the
>> >> origin, which is why the plugin was developed in the first place.
>> >> but now core has caught up, and the plugin is no longer needed ?
>> >>
>> >> thanks!
>> >>
>> >>
>> >>
>> >>
>> >> On Wed, May 16, 2018 at 2:51 PM, Zelkowitz, Evan
>> >> <Evan_Zelkowitz@comcast.com> wrote:
>> >> > So now that Im testing it more I can see on the origin side that for
each
>> >> > one of these ERR_CONNECT_FAILS that that is going back to my client
>> >> > application which is then getting the redirect and going to the origin.
 It
>> >> > doesn’t even appear to be doing the number of retries or delays that
I set
>> >> > for the collapsed_forwarding plugin.  So something seems to be going
wrong
>> >> > with the plugin somewhere. I can see it attempting the delay, so it
can see
>> >> > the writefail header, and it sets the ‘internal’ redirect properly
but that
>> >> > appears to leak out back to the client before it gets attempted by
ATS
>> >> > itself.
>> >> >
>> >> >
>> >> >
>> >> > From: "Zelkowitz, Evan" <Evan_Zelkowitz@comcast.com>
>> >> > Reply-To: "users@trafficserver.apache.org" <users@trafficserver.apache.org>
>> >> > Date: Tuesday, May 15, 2018 at 2:08 PM
>> >> > To: "users@trafficserver.apache.org" <users@trafficserver.apache.org>
>> >> > Subject: [EXTERNAL] Issue with collapsed forwarding plugin, one CONN_FAIL
>> >> >
>> >> >
>> >> >
>> >> > Ive been testing the collapsed forwarding plugin here by sending it
random
>> >> > traffic, one request with 10 duplicates of the same request.  One thing
Im
>> >> > seeing is that a lot of the time one of the requests out of those 10
will
>> >> > come back as an ERR_CONN_FAIL, but the pssc will be the 303 redirect.
>> >> >
>> >> >
>> >> >
>> >> > I was just wondering if this was expected? Is it an issue or has anyone
seen
>> >> > this?
>> >> >
>> >> >
>> >> >
>> >> > Here is my output logging showing one of these instances:
>> >> >
>> >> >
>> >> >
>> >> > 1526413868.121 chi=test-ip phn=test-cache-09.net php=80
>> >> > shn=origin-test-cache-52.net
>> >> > url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
>> >> > cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=1 b=107782 sssc=200 sscl=107782
>> >> > cfsc=FIN pfsc=FIN crc=TCP_MISS phr=DIRECT pqsn=origin-test-cache-52.net
>> >> > uas="CUSTOMUAS/0.0.0" xmt="-"
>> >> >
>> >> > 1526413868.123 chi=test-ip phn=test-cache-09.net php=80
>> >> > shn=origin-test-cache-52.net
>> >> > url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
>> >> > cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
>> >> > pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
>> >> >
>> >> > 1526413868.124 chi=test-ip phn=test-cache-09.net php=80
>> >> > shn=origin-test-cache-52.net
>> >> > url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
>> >> > cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
>> >> > pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
>> >> >
>> >> > 1526413868.125 chi=test-ip phn=test-cache-09.net php=80
>> >> > shn=origin-test-cache-52.net
>> >> > url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
>> >> > cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
>> >> > pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
>> >> >
>> >> > 1526413868.126 chi=test-ip phn=test-cache-09.net php=80
>> >> > shn=origin-test-cache-52.net
>> >> > url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
>> >> > cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
>> >> > pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
>> >> >
>> >> > 1526413868.126 chi=test-ip phn=test-cache-09.net php=80
>> >> > shn=origin-test-cache-52.net
>> >> > url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
>> >> > cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
>> >> > pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
>> >> >
>> >> > 1526413868.127 chi=test-ip phn=test-cache-09.net php=80
>> >> > shn=origin-test-cache-52.net
>> >> > url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
>> >> > cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=0 b=107782 sssc=000 sscl=0 cfsc=FIN
>> >> > pfsc=FIN crc=TCP_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
>> >> >
>> >> > 1526413868.127 chi=test-ip phn=test-cache-09.net php=80
>> >> > shn=origin-test-cache-52.net
>> >> > url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
>> >> > cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=60 b=107782 sssc=000 sscl=0 cfsc=FIN
>> >> > pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
>> >> >
>> >> > 1526413868.121 chi=test-ip phn=test-cache-09.net php=80
>> >> > shn=origin-test-cache-52.net
>> >> > url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
>> >> > cqhm=GET cqhv=HTTP/1.1 pssc=303 ttms=68 b=247 sssc=000 sscl=0 cfsc=FIN
>> >> > pfsc=FIN crc=ERR_CONNECT_FAIL phr=DIRECT pqsn=origin-test-cache-52.net
>> >> > uas="CUSTOMUAS/0.0.0" xmt="-"
>> >> >
>> >> > 1526413868.122 chi=test-ip phn=test-cache-09.net php=80
>> >> > shn=origin-test-cache-52.net
>> >> > url=http://origin-test-cache-52.net/muxed-3000-prDi.tJZFQeIkXN_QYeLWwqAeRbJpH.ts
>> >> > cqhm=GET cqhv=HTTP/1.1 pssc=200 ttms=108 b=107782 sssc=000 sscl=0 cfsc=FIN
>> >> > pfsc=FIN crc=TCP_MEM_HIT phr=NONE pqsn=- uas="CUSTOMUAS/0.0.0" xmt="-"
>> >> >
>> >> >
>> >> >
>> >> >
>> >>
>>

Mime
View raw message