libcloud-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomaz Muraus <to...@apache.org>
Subject Re: How to shorten file download time?
Date Tue, 02 Sep 2014 17:29:50 GMT
It depends on your use-case, but in general:

1. Downloading multiple files

If you want to download multiple files / objects, you can parallelize this
process. You can either do this by downloading each object in a separate
thread or process and / or by utilizing a thread or process pool.

If you want speed things up and reduce thread / process overhead, you
should also have a look at gevent (http://www.gevent.org/).

That's the approach I use in file_syncer where a common case is that
multiple independent operations are performed in parallel (downloading /
uploading files) -
https://github.com/Kami/python-file-syncer/blob/master/file_syncer/syncer.py#L143

2. Downloading a single file / container and object ID is known in advance

If you know the container and object ID in advance, you can avoid 2 HTTP
requests (get_container, get_object) by manually instantiating Container
and Object class with the known IDs. There are some examples of how to do
that at
https://libcloud.readthedocs.org/en/latest/other/working-with-oo-apis.html

In this case, using gevent wouldn't really speed things up much since you
are only issuing one HTTP request (unless an object is composed of multiple
chunks and provider allows you to retrieve chunks independently...).


On Tue, Sep 2, 2014 at 5:30 PM, Chris Richards <chris@infiniteio.com> wrote:

> Howdy. I've noticed a variance in the download time of a file depending on
> the method of download, and I'm hoping to shave off overhead. I'm using the
> standard s3 provider. That stats I present are consistent between my office
> and my home within +/-100 ms.  In shortened form:
>
> -
> Timing via driver.get_container().get_object().download()
> get_container: 431.4596652984619 ms
> get_object: 808.0205917358398 ms
> download: 8257.043838500977 ms
> Complete, downloaded 8.15 MB
>
> Timing via driver.get_object().download()
> get_object: 811.8221759796143 ms
> download: 4801.661729812622 ms
> Complete, downloaded 8.15 MB
> -
>
>
> In the first case, it appears that getting the container has significant
> overhead and should be avoided if possible--which I can do--and trims
> 400-500 ms per download (for small files, this is significant). Is my
> observation and conclusion correct?
>
> In the second case, I want to examine is the .get_object() requirement to
> download a file. This adds another significant overhead on the order of
> 700-900 ms. Is there a way to bypass this?  I have many small files where
> the .get_object() time exceeds that of the .download() time!
>
> import std.newbie.disclaimer
>
> Thanks!
> Chris
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message