thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Engberg <dengb...@evernote.com>
Subject Re: Serializing large data sets
Date Fri, 11 Jun 2010 15:32:15 GMT

Evernote uses Thrift for all client-server communications, including 
third-party API integrations 
(http://www.evernote.com/about/developer/api/).  We serialize messages 
up to 55MB via Thrift.  This is very efficient on the wire, but 
marshalling and unmarshalling objects can take a fair amount of RAM due 
to various temporary buffers built into the networking and IO runtime 
libraries.


On 6/11/10 8:26 AM, Abhay M wrote:
> Hi,
>
> Are there any know concerns with serializing large data sets with Thrift? I
> am looking to serialize messages with 10-150K records, sometimes resulting
> in ~30M per message. These messages are serialized for storage.
>
> I have been experimenting with Google protobuf and saw this in the
> documentation (
> http://code.google.com/apis/protocolbuffers/docs/techniques.html) -
> "Protocol Buffers are not designed to handle large messages. As a general
> rule of thumb, if you are dealing in messages larger than a megabyte each,
> it may be time to consider an alternate strategy."
> FWIW, I did switch to delimited write/parse API (Java only) as recommended
> in the doc and it works well. But, Python protobuf impl lacks this API and
> is slow.
>
> Thanks
> Abhay
>
>    

Mime
View raw message