jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Ryan (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (OAK-9304) Filename portion of direct download URI Content-Disposition should be ISO-8859-1 encoded
Date Mon, 04 Jan 2021 19:15:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258433#comment-17258433
] 

Matt Ryan edited comment on OAK-9304 at 1/4/21, 7:14 PM:
---------------------------------------------------------

Sure thing [~reschke].  Sorry, I've been on holidays :)

Previously, in regard to the example in the description above, you said:  "The first of the
two entries looks perfectly ok to me."  The issue here is that the first one does not work
with Azure blob storage service - it rejects the request as having an invalid character in
the URI.  So this is less an issue of whether the URI is correct per RFCs, and more an issue
that the URI does not properly work with Azure.

More details follow.

PRIOR TO THIS FIX:  When Oak would attempt to generate a direct binary access URI for a filename
with characters outside the ISO-8859-1 character set, this would result in a URI that Azure
would reject with a 400-level error.  The reason was due to Oak failing to properly encode
this filename in the "filename" portion of the Content-Disposition header specification.

(As background, remember that Oak declares to the cloud storage the value that should be used
in the Content-Disposition header for requests to the generated direct binary access URI. 
In Oak we specify both the content disposition type and filenames for this.  See [0] and
[1] for more info.)

Example:  Suppose the filename is "umläut.jpg".  Oak would specify a Content-Disposition
header value of:
{noformat}
inline; filename="umläut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This is then specified in a query parameter in the direct access URI, so this information
gets encoded.  It is probably this encoding change that Azure does not expect.  Since this
portion of the URI is signed, the signature doesn't match and the request fails.

WITH THIS FIX:  A basic ISO-8859-1 encoding is done on the "filename" value of the header. 
This was made based on RFC6266 Section 4.3 which seems to suggest that only ISO-8859-1 characters
are allowed for that value.

Thus the header now looks like this:
{noformat}
inline; filename="umla?ut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This header encodes and validates properly with Azure.  In testing, modern clients prefer
the "filename*" portion, which results in the proper filename being used.

Please let me know if this is still unclear.

 

[0] - [https://jackrabbit.apache.org/oak/docs/features/direct-binary-access.html]

[1] - [https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.html]


was (Author: mattvryan):
Sure thing [~reschke].  Sorry, I've been on holidays :)

PRIOR TO THIS FIX:  When Oak would attempt to generate a direct binary access URI for a filename
with characters outside the ISO-8859-1 character set, this would result in a URI that Azure
would reject with a 400-level error.  The reason was due to Oak failing to properly encode
this filename in the "filename" portion of the Content-Disposition header specification.

(As background, remember that Oak declares to the cloud storage the value that should be used
in the Content-Disposition header for requests to the generated direct binary access URI. 
In Oak we specify both the content disposition type and filenames for this.  See [0] and
[1] for more info.)

Example:  Suppose the filename is "umläut.jpg".  Oak would specify a Content-Disposition
header value of:
{noformat}
inline; filename="umläut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This is then specified in a query parameter in the direct access URI, so this information
gets encoded.  It is probably this encoding change that Azure does not expect.  Since this
portion of the URI is signed, the signature doesn't match and the request fails.

WITH THIS FIX:  A basic ISO-8859-1 encoding is done on the "filename" value of the header. 
This was made based on RFC6266 Section 4.3 which seems to suggest that only ISO-8859-1 characters
are allowed for that value.

Thus the header now looks like this:
{noformat}
inline; filename="umla?ut.jpg"; filename*=''umla%CC%88ut.jpg{noformat}
This header encodes and validates properly with Azure.  In testing, modern clients prefer
the "filename*" portion, which results in the proper filename being used.

Please let me know if this is still unclear.

 

[0] - [https://jackrabbit.apache.org/oak/docs/features/direct-binary-access.html]

[1] - [https://jackrabbit.apache.org/oak/docs/apidocs/org/apache/jackrabbit/api/binary/BinaryDownloadOptions.html]

> Filename portion of direct download URI Content-Disposition should be ISO-8859-1 encoded
> ----------------------------------------------------------------------------------------
>
>                 Key: OAK-9304
>                 URL: https://issues.apache.org/jira/browse/OAK-9304
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob-cloud, blob-cloud-azure, blob-plugins
>    Affects Versions: 1.36.0
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
>
> The "filename" portion of the Content-Disposition needs to be ISO-8859-1 encoded, per
[https://tools.ietf.org/html/rfc6266#section-4.3] in this paragraph:
> {quote}The parameters "filename" and "filename*" differ only in that "filename*" uses
the encoding defined in RFC5987, allowing the use of characters not present in the ISO-8859-1
character set ISO-8859-1.
> {quote}
> This is not usually a problem, but if the filename provided contains non-standard characters,
it can cause the resulting signed URI to be invalid.  This can lead to blob storage services
being unable to service the URl request.
> For example, a filename of "Ausländische.jpg" currently requests a Content-Disposition
header that looks like:
> {noformat}
> inline; filename="Ausländische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg {noformat}
> It instead should look like:
> {noformat}
> inline; filename="Ausla?ndische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message