thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhay M <>
Subject Re: Serializing large data sets
Date Fri, 11 Jun 2010 20:04:33 GMT
Thanks! This is helpful.

I'll try to get a sense of how much RAM it'll take to deserialize for this
type of messages -

struct TRecordList{
1: list<TRec> records,

Assuming message is first parsed into TRec beans (because it is defined as
list of beans), which are in turn converted into application beans, I am
guessing approximately 3 times the size of serialized message (probably

Thanks again

On Fri, Jun 11, 2010 at 11:32 AM, Dave Engberg <>wrote:

> Evernote uses Thrift for all client-server communications, including
> third-party API integrations (
>  We serialize messages up to 55MB via Thrift.  This is very efficient on the
> wire, but marshalling and unmarshalling objects can take a fair amount of
> RAM due to various temporary buffers built into the networking and IO
> runtime libraries.
> On 6/11/10 8:26 AM, Abhay M wrote:
>> Hi,
>> Are there any know concerns with serializing large data sets with Thrift?
>> I
>> am looking to serialize messages with 10-150K records, sometimes resulting
>> in ~30M per message. These messages are serialized for storage.
>> I have been experimenting with Google protobuf and saw this in the
>> documentation (
>> -
>> "Protocol Buffers are not designed to handle large messages. As a general
>> rule of thumb, if you are dealing in messages larger than a megabyte each,
>> it may be time to consider an alternate strategy."
>> FWIW, I did switch to delimited write/parse API (Java only) as recommended
>> in the doc and it works well. But, Python protobuf impl lacks this API and
>> is slow.
>> Thanks
>> Abhay

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message