jackrabbit-oak-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julian Reschke (Jira)" <j...@apache.org>
Subject [jira] [Commented] (OAK-9304) Filename with special characters in direct download URI Content-Disposition are causing HTTP 400 errors from Azure
Date Tue, 05 Jan 2021 09:40:00 GMT

    [ https://issues.apache.org/jira/browse/OAK-9304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258780#comment-17258780

Julian Reschke commented on OAK-9304:


It *really* is important not to use non-ASCII chars in the source code so it's clear what's
going on. It seems that you were testing the NFD encoding of "a umlaut" which indeed is not
ISO-8859-1. (See example source below).

Also, *never* use the String constructor (for byte[]) without specifying the charset, the
outcome is platform dependent.

*If* the intent is to strip non-ASCII characters, the simplest way is to copy char-by-char
to a new String and remove/replace these characters.

Finally, my suspicion is that the problem you want to solve is somewhere else: where the desired
field value of Content-Disposition is sent to Azure. If that happens as a query parameter,
it itself may need encoding or recoding. (If you can point me at the source or the docs I
might be able to help).

So, here's the test code:

public class EncTest {

    public static void main(String[] args) {
        System.out.println("Test with NFC");

        System.out.println("Test with NFD");

    public static void test(String input) {
        Charset ISO_8859_1 = Charset.forName("ISO-8859-1");
        Charset UTF_8 = Charset.forName("UTF-8");
        System.out.println("input: " + input);
        byte[] bytes = ISO_8859_1.encode(input).array();
        System.out.println("output (parsed as ISO-8859-1): " + new String(bytes, ISO_8859_1));
        System.out.println("output (parsed as UTF-8): " + new String(bytes, UTF_8));

    private static void dump(byte[] bytes) {
        StringBuffer sb = new StringBuffer();
        for (byte b : bytes) {
            sb.append(String.format("%02x ", b));


Test with NFC
input: umläut.jpg
75 6d 6c e4 75 74 2e 6a 70 67 
output (parsed as ISO-8859-1): umläut.jpg
output (parsed as UTF-8): uml?ut.jpg

Test with NFD
input: umla?ut.jpg
75 6d 6c 61 3f 75 74 2e 6a 70 67 
output (parsed as ISO-8859-1): umla?ut.jpg
output (parsed as UTF-8): umla?ut.jpg

> Filename with special characters in direct download URI Content-Disposition are causing
HTTP 400 errors from Azure
> ------------------------------------------------------------------------------------------------------------------
>                 Key: OAK-9304
>                 URL: https://issues.apache.org/jira/browse/OAK-9304
>             Project: Jackrabbit Oak
>          Issue Type: Bug
>          Components: blob-cloud, blob-cloud-azure, blob-plugins
>    Affects Versions: 1.36.0
>            Reporter: Matt Ryan
>            Assignee: Matt Ryan
>            Priority: Major
> When generating a direct download URI for a filename with certain non-standard characters
in the name, it can cause the resulting signed URI to be considered invalid by some blob storage
services (Azure in particular).  This can lead to blob storage services being unable to
service the URl request.
> For example, a filename of "Ausländische.jpg" currently requests a Content-Disposition
header that looks like:
> {noformat}
> inline; filename="Ausländische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg {noformat}
> Azure blob storage service fails trying to parse a URI with that Content-Disposition
header specification in the query string.  It instead should look like:
> {noformat}
> inline; filename="Ausla?ndische.jpg"; filename*=UTF-8''Ausla%CC%88ndische.jpg {noformat}
> The "filename" portion of the Content-Disposition needs to consist of ISO-8859-1 characters,
per [https://tools.ietf.org/html/rfc6266#section-4.3] in this paragraph:
> {quote}The parameters "filename" and "filename*" differ only in that "filename*" uses
the encoding defined in RFC5987, allowing the use of characters not present in the ISO-8859-1
character set ISO-8859-1.
> {quote}
> Note that the purpose of this ticket is to address compatibility issues with blob storage
services, not to ensure ISO-8859-1 compatibility.  However, by encoding the "filename" portion
using standard Java character set encoding conversion (e.g. {{Charsets.ISO_8859_1.encode(fileName)}}),
we can generate a URI that works with Azure, delivers the proper Content-Disposition header
in responses, and generates the proper client result (meaning, the correct name for the downloaded

This message was sent by Atlassian Jira

View raw message