trafficserver-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pavel Vajarov <pa...@x3me.net>
Subject Questions about URL/host validation
Date Thu, 19 Mar 2015 14:00:32 GMT
Hi,

I encounter a situation in which host name, returned by the ATS API,
contains invalid UTF-8 symbols. It's very rare, but it happens from time to
time on the our installation. I mean that from about 100 000 different
url-s few hundreds contained invalid UTF-8 symbols. I tried to get the host
name in both ways from the MIME headers and from the URL but results was
the same. Please, see the pseudo code shown below with the two variants
(the error handling and the resource releasing is missing).

TSMBuffer mbuff;
 TSMLoc req_mloc;
if (TSHttpTxnClientReqGet(http_txn, &mbuff, &req_mloc) != TS_SUCCESS)
   return false;

// Variant 1
fld_mloc  = TSMimeHdrFieldFind(mbuff, req_mloc,
                                                      TS_MIME_FIELD_HOST,
TS_MIME_LEN_HOST);
if (fld_mloc)
{
   int host_len = 0;
   const char* host =
        TSMimeHdrFieldValueStringGet(mbuff, req_mloc, fld_mloc, -1,
&host_len);
    if (host && (host_len > 0))
    {
         ValidateHostSymbols(host, host_len);
         // Do something with the host
         return true;
     }
}

// Variant 2
TSMLoc url_mloc;
if (TSHttpHdrUrlGet(mbuff, req_mloc, &url_mloc) != TS_SUCCESS)
{
    return false;
}
int host_len = 0;
const char* host = TSUrlHostGet(mbuff, url_mloc, &host_len);
if (host && (host_len > 0))
{
       ValidateHostSymbols(host, host_len);
         // Do something with the host
         return true;
}

For the validation I used a function from the google protobufs library
which validates the strings for invalid UTF-8 symbols when
serialize/deserialize them. As far as I know not all UTF-8 symbols can be
present in a URL, according to the standard, but the symbols that can be
present there are subset of the UTF-8 symbols. That's why I assume that a
URL/host should contain only valid UTF-8 symbols. Am I wrong?

My questions are:
1. Does the ATS do some kind of validation of the symbols in the URLs/hosts
of the client requests? Should I assume the API returns only validated
hosts and search the problem on my side. Although, I do the validation
immediately when I get the info and most of the hosts are OK.

2. Is there a way to find out when the ATS detects that it has received
such an invalid requests, assuming that there are clients which sends such
invalid URLs/hosts?

Thanks for the help,
Pavel Vazharov.

Mime
View raw message