httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter <...@citylink.dinoex.sub.org>
Subject Re: [users@httpd] [patch] Apache converts GZIPed data into UTF-8 - 2nd Act
Date Tue, 16 Apr 2019 01:51:23 GMT
On Mon, Apr 15, 2019 at 11:43:21PM +0100, Nick Kew wrote:

Hi Nick,

! OK, I've looked.

me too. ;)

! What I'd like to do - pass responsibility back to the module
! that inserted the xml2enc filter - calls for a minor API
! change, so isn't going to happen in 2.4.x.  A variant on
! that approach might work, but right now I don't see anything
! better than replicating mod_proxy_html's logic in mod_xml2enc
! to deal with the situation where they're interacting.
! 
! Your check on content-encoding can also looks good.
! Except that unless I'm missing something, your use of f->r->notes
! is unnecessary: ap_remove_output_filter means we don't revisit
! that code!

Yes, it were unnecessary, but for a different reason: my code is
currently not at the proper place.
Given a chain DEFLATE;XML2ENC;INFLATE it looks like this:

[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation
of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:126 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition
for 'inflate' matched
[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation
of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:127 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition
for 'xml2enc' matched
[xml2enc:debug] [pid 77874] mod_xml2enc.c(176): [client 192.168.97.18:65401] AH01430: Content-Type
is text/css
[xml2enc:debug] [pid 77874] mod_xml2enc.c(250): [client 192.168.97.18:65401] AH01434: Charset
ISO-8859-1 not supported by libxml2; trying apr_xlate
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc:
consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc:
converted 8096/8096 bytes
[filter:trace4] [pid 77874] util_expr_eval.c(858): [client 192.168.97.18:65401] Evaluation
of expression from /usr/local/etc/apache24/extra/httpd-ruby.conf:130 gave: 1
[filter:trace2] [pid 77874] mod_filter.c(159): [client 192.168.97.18:65401] Expression condition
for 'deflate' matched
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc:
consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc:
converted 8096/8096 bytes
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc:
consuming 8096 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc:
converted 8096/8096 bytes
[deflate:debug] [pid 77874] mod_deflate.c(1622): [client 192.168.97.18:65401] AH01398: Zlib:
Inflated 6176 to 28247 : URL /fin-stage/assets/application-3a5821b5be536e0108d5934c96815299001dfa3c1ddff9f39676a3a3126d8190.css
[xml2enc:debug] [pid 77874] mod_xml2enc.c(476): [client 192.168.97.18:65401] AH01439: xml2enc:
consuming 3959 bytes from bucket
[xml2enc:debug] [pid 77874] mod_xml2enc.c(502): [client 192.168.97.18:65401] AH01441: xml2enc:
converted 3959/3959 bytes
[deflate:debug] [pid 77874] mod_deflate.c(854): [client 192.168.97.18:65401] AH01384: Zlib:
Compressed 28247 to 6226 : URL /fin-stage/assets/application-3a5821b5be536e0108d5934c96815299001dfa3c1ddff9f39676a3a3126d8190.css

Currently my snippet it is run for each of these chunks of data
(which is not a good idea, but I didn't hope to be able to understand
the code in its fullness and find a better place). So, with the
DEFLATE walking behind, when it comes to the second chunk, the
DEFLATE will already have put the "gzip" header back in, and so 
I watched xml2enc quit in the midst of the document.
Thats why I put that in.

Another minor flaw is that the test for "Content-Encoding: identity" 
(btw: does anybody use that?) is probably not case-insensitive.

And then I was thinking about a different and probably better approach: 
if we can check the first few bytes of the actual document
beforehand, we can test these against the signatures of the usual
compression-algorithms (in the same way as the "file" command does it
on Unix). This seems more safe than relying on header information.

Because, I don't see a reason why an HTML document might not also be
compressed - and then it wouldn't help to just stop processing CSS 
documents. 

Btw, concerning this message, I had a look at that one, too:
   AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate

It seems to me that this message is reached just because the document
is compressed (and libxml2 can obviousely not find a charset in
that); only the message text seems misleading.
Maybe a conservative approach would be to just stop at that point
and give up - because, compression might not be the only issue here;
people might get the idea to use some end-to-end encryption for
certain documents, and that would also appear as binary data that we
must not tamper with...
(just thinking along)

cheerio,
PMc

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message