Hi,

I encounter a situation in which host name, returned by the ATS API, contains invalid UTF-8 symbols. It's very rare, but it happens from time to time on the our installation. I mean that from about 100 000 different url-s few hundreds contained invalid UTF-8 symbols. I tried to get the host name in both ways from the MIME headers and from the URL but results was the same. Please, see the pseudo code shown below with the two variants (the error handling and the resource releasing is missing).

TSMBuffer mbuff;
 TSMLoc req_mloc;
if (TSHttpTxnClientReqGet(http_txn, &mbuff, &req_mloc) != TS_SUCCESS)
   return false;

// Variant 1
fld_mloc  = TSMimeHdrFieldFind(mbuff, req_mloc, 
                                                      TS_MIME_FIELD_HOST, TS_MIME_LEN_HOST);
if (fld_mloc)
{
   int host_len = 0;
   const char* host =
        TSMimeHdrFieldValueStringGet(mbuff, req_mloc, fld_mloc, -1, &host_len);
    if (host && (host_len > 0))
    {
         ValidateHostSymbols(host, host_len);
         // Do something with the host
         return true;
     }
}

// Variant 2
TSMLoc url_mloc;
if (TSHttpHdrUrlGet(mbuff, req_mloc, &url_mloc) != TS_SUCCESS)
{
    return false;
}
int host_len = 0;
const char* host = TSUrlHostGet(mbuff, url_mloc, &host_len);
if (host && (host_len > 0))
{
       ValidateHostSymbols(host, host_len);
         // Do something with the host
         return true;
}

For the validation I used a function from the google protobufs library which validates the strings for invalid UTF-8 symbols when serialize/deserialize them. As far as I know not all UTF-8 symbols can be present in a URL, according to the standard, but the symbols that can be present there are subset of the UTF-8 symbols. That's why I assume that a URL/host should contain only valid UTF-8 symbols. Am I wrong?

My questions are:
1. Does the ATS do some kind of validation of the symbols in the URLs/hosts of the client requests? Should I assume the API returns only validated hosts and search the problem on my side. Although, I do the validation immediately when I get the info and most of the hosts are OK.

2. Is there a way to find out when the ATS detects that it has received such an invalid requests, assuming that there are clients which sends such invalid URLs/hosts?

Thanks for the help,
Pavel Vazharov.