libcloud-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [libcloud] RunOrVeith opened a new issue #1550: Azure Blob Storage does not return Content-Encoding when blob was uploaded via stream
Date Thu, 28 Jan 2021 17:29:20 GMT

RunOrVeith opened a new issue #1550:
URL: https://github.com/apache/libcloud/issues/1550


   ## Summary
   
   When uploading a blob to Azure Blob Storage that is gzip encoded using `upload_object_via_stream`
and then try to read it again, the response does not contain the `Content-Encoding: gzip`
header. It works when uploading using `upload_object`.
   
   ## Detailed Information
   
   Ubuntu 18.04, Libcloud 3.3.1, python 3.8
   
   **Code to Reproduce**:
   
   ```python
   import gzip
   from libcloud.storage.providers import Provider, get_driver
   
   
   def create_gzipped_file():
       content = b"I am a test file with super nice content that is so interesting to read"
       encoded_bytes = gzip.compress(content)
       file = "example_gzipped_file.txt"
       with open(file, mode="wb") as f:
           f.write(encoded_bytes)
       return file
   
   
   account = ""  # Add a service account here
   key = ""  # Add a key here
   container_name = ""  # Add a container name here
   
   
   driver = get_driver(Provider.AZURE_BLOBS)(key=account, secret=key)
   container = driver.get_container(container_name=container_name)
   
   # Upload a dummy file with gzip encoding via stream
   content = b"I am a test file with super nice content that is so interesting to read"
   encoded_bytes = gzip.compress(content)
   streaming_file = "example_gzipped_file_stream_upload.txt"
   driver.upload_object_via_stream(
       iterator=iter([encoded_bytes]),
       container=container,
       object_name=streaming_file,
       extra={"content_type": "text/plain"},
       headers={"Content-Encoding": "gzip"},
   )
   
   # Upload a dummy file with gzip encoding regularly
   regular_file = create_gzipped_file()
   driver.upload_object(
       regular_file,
       container=container,
       object_name=regular_file,
       extra={"content_type": "text/plain"},
       headers={"Content-Encoding": "gzip"},
   )
   
   # Get the headers, this is a HEAD request, but also not present in a GET request
   regular_response_headers = driver.get_object(container_name=container_name, object_name=regular_file).extra
   streaming_response_headers = driver.get_object(container_name=container_name, object_name=streaming_file).extra
   print(regular_response_headers["content_encoding"])  # gzip
   print(streaming_response_headers["content_encoding"])
   # --> content_encoding is not present (or rather defaults to None)
   
   ```
   
   As you can see, the file that was uploaded via stream does not have a valid Content-Encoding
header.
   I looked for a possible cause, but could not really find one.
   
   What I found is:
   - `AzureBlobStorageDriver._commit_block` does not set any headers. This is called after
all chunks have been uploaded. However, setting the header in that request manually does not
help
   - The Azure documentation for both [PUT](https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob)
and [GET](https://docs.microsoft.com/en-us/rest/api/storageservices/get-blob) mention that
the Content-Encoding header should be returned
   - Difference in execution path for `upload_object` and `upload_object_via_stream` is a
call to `AzureBlobStorageDriver._upload_directly` for the former, and `AzureBlobStorageDriver._upload_in_chunks`
for the later
   - Similar to #1544, this might be a bug within Azure itself, but here I could not verify
it completely
   - I have not found a workaround other than just trying if it was gzip encoded if I can't
read it normally
   - Not sure if `upload_object_via_stream` uses [Append Block](https://docs.microsoft.com/en-us/rest/api/storageservices/append-block)
internally, but that at least does not accept the Content-Encoding header. 
   
   **Possible Solutions**
   - if this is a bug that can be fixed, the Content-Encoding should be present when retrieving
the blob regardless of upload method
   - If this is a limitation of streaming upload, an error (or at least a warning) should
be in place if the user supplied the `Content-Encoding` header when uploading
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message