thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Engberg <dengb...@evernote.com>
Subject Re: Serializing large data sets
Date Fri, 11 Jun 2010 17:13:25 GMT

No, we only use HTTP Transport.  For anything on the public Internet, 
this is the only way to go ... it also gives you lots of extra 
advantages like client firewall support, hardware load balancing, SSL 
"for free", etc.  When we were adopting Thrift three years ago, I did 
some synthetic load tests to compare the overhead of THttpClient 
transport versus direct binary transport.  If the HTTP stack supports 
proper HTTP Keep-Alive, the overhead was negligible (under 20%).  
Unfortunately, several languages don't do proper keep-alive in their 
HTTP libraries by default, so your mileage may vary drastically.

We mitigate against Thrift-related denial of services through a mix of 
measures that should (hopefully) make a Thrift protocol attack less 
fruitful than other attacks.  (I.e. so that Thrift isn't the weakest link.)
For example, we use maxSkipDepth() to avoid bogus sequences of nested 
structures:
http://svn.apache.org/viewvc/incubator/thrift/trunk/lib/java/src/org/apache/thrift/protocol/TProtocolUtil.java?revision=760189&view=co
And we determine the total incoming message length via the HTTP 
Content-Length header to reject big messages before parsing, and use 
this as a limit to TBinaryProtocol.setReadLength() to automatically 
reject bogus object length/size fields:
http://svn.apache.org/viewvc/incubator/thrift/trunk/lib/java/src/org/apache/thrift/protocol/TBinaryProtocol.java?view=co

Our use of Thrift is obviously a bit unusual compared to most folks 
using it for internal server-server communications, but we have millions 
of distinct client machines talking Thrift to Evernote every month, so I 
can vouch that it works.


On 6/11/10 9:37 AM, Bjørn Borud wrote:
> On Fri, Jun 11, 2010 at 5:32 PM, Dave Engberg<dengberg@evernote.com>  wrote:
>
>    
>> Evernote uses Thrift for all client-server communications, including
>> third-party API integrations (http://www.evernote.com/about/developer/api/).
>>   We serialize messages up to 55MB via Thrift.  This is very efficient on the
>> wire, but marshalling and unmarshalling objects can take a fair amount of
>> RAM due to various temporary buffers built into the networking and IO
>> runtime libraries.
>>
>>      
> do you use TFramedTransport?  if so, I would assume that you have set the
> frame size to 55Mb avoid the OOM error problems?  I've been thinking a bit
> about this lately since I may want to expose a Thrift API to the outside
> world.  Not setting a limit makes is exceptionally susceptible to
> denial-of-service (just connect a socket and say "asdf" and boom).  Setting
> the limit too high would require about 5 minutes more hacking to create a
> program that sucks up lots of resources on the server.
>
> (I guess this problem is also why TFramedTransport avoids using
> direct-allocated ByteBuffer?)
>
> One improvement would be to have the ability to do sanity checks on frames
> over a certain size -- so that connections writing bogus data can be killed
> off early.  But it isn't a quick fix and I am not entirely convinced that it
> is worthwhile either.
>
> -Bjørn
>
>    

Mime
View raw message