libcloud-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Xia (JIRA)" <>
Subject [jira] [Commented] (LIBCLOUD-903) AWS S3 upload_object_via_stream fails on non-file iterable due to missing Content-Length header
Date Sat, 22 Apr 2017 06:16:04 GMT


Richard Xia commented on LIBCLOUD-903:

According to my post, I believe it was v2.0.0rc1-tentative.

> AWS S3 upload_object_via_stream fails on non-file iterable due to missing Content-Length
> -----------------------------------------------------------------------------------------------
>                 Key: LIBCLOUD-903
>                 URL:
>             Project: Libcloud
>          Issue Type: Bug
>            Reporter: Richard Xia
> The issue I am seeing appears to be due to the incorrect integration of 4 separate libraries,
but I believe the real problem is here in libcloud, in the {{upload_object_via_stream()}}
method on the S3 storage driver.
> I am using Python 3.5.1 and the the four libraries I am using are:               
> * Django 1.10.6                                                                  
> * django-storages 1.5.2                                                          
> * libcloud v2.0.0rc1-tentative                                                   
> * requests 2.13.0                                                                
> Specifically, when I try to use a Django [ContentFile|],
Django's own file-like wrapper for strings, to save a new file to S3 via the Libcloud backend
of django-storages, I get the following error:
> {code:xml}                                                                       
> <?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>NotImplemented</Code><Message>A
header you provided implies functionality that is not implemented</Message><Header>Transfer-Encoding</Header><RequestId>A2FC4D5109083076</RequestId><HostId>K9WGhd18iqQHyIyv+GxWcxHexvapVSidTtHzSqujtT9nT5LhmIEygMKOfR/7F0v7ujnlE/CoYiM=</HostId></Error>
> {code}                                                                           
> The reason this happens is because Libcloud is generating an HTTP request to AWS S3 that
is missing the {{Content-Length}} header. AWS S3 requires the {{Content-Length}} header for
file uploads *unless* if it is a multi-part upload. This is why this used to work on the 1.5.0
release of {{libcloud}}, because even single-part uploads were done as a one-part multi-part
> I've traced my bug down through all four libraries and have determined exactly why the
{{Content-Length}} header is missing in my particular use case. The {{upload_object_via_stream()}}
has an {{iterator}} argument that should yield the content body data, and it eventually passes
that argument directly to the {{requests}} library. The {{requests}} library will actually
[try very hard to add the {{Content-Length}} header|],
even for certain types of iterator streams. In particular it can determine the length of file-like
objects which support stat operations and it can handle StringIO/BytesIO objects. However,
the Django {{ContentFile}} is neither, and {{requests}} cannot extract the length of the stream
without consuming the iterator, so it does not try.
> Here's some (Python 3) code to demonstrate the bug:                              
> {code:python}                                                                    
> from io import BytesIO                                                           
> class MyWrapper(object):                                                         
>     """A contrived wrapper that acts similar to BytesIO."""                      
>     def __init__(self, content):                                                 
>         self.content = BytesIO(content)                                          
>     def __iter__(self):                                                          
>         yield                                                
> # Assume driver is already set to some S3 provider w/ credentials                
> container = driver.get_container(container_name='my-container')                  
> driver.upload_object_via_stream(iterator=iter(MyWrapper(b'hello world')),        
>                                 container=container,                             
>                                 object_name='my_file.txt')                       
> {code}                                                                           
> I think the proper solution to this will require all calls to the S3 {{upload_object_via_stream()}}
to use the multi-part uploader in order to eschew the need for the {{Content-Length}} header.
If desired, you could make the same optimizations that the request library makes by checking
for certain common cases where you do know the file size and only using the multi-part uploader
when necessary.

This message was sent by Atlassian JIRA

View raw message