thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Hammond <>
Subject Re: Python server over HTTP, HTTPS -- How?
Date Fri, 24 Apr 2009 17:51:15 GMT
Thanks for the example.  I will stick with nginx right now because I  
am familiar with it.  Also, it has a very stable memory footprint as  
more clients connect at the same time.  I will keep this in mind though.


On Apr 24, 2009, at 8:06 AM, Chad Maine wrote:

> Brian, the past few days I've been looking at serving thrift  
> services with Python over HTTP.  I blogged about dropping an example  
> thrift service into a Django app here:

> .  I also experimented with running a thrift service as a very thin  
> WSGI app w/ SQLAlchemy doing the data access under Apache and  
> mod_wsgi (  That performed  
> extremely well for me and might be a lot closer to what you are  
> looking for.  You can then tune your preferred Apache MPM for  
> threading or prefork or whatever works best for you.
> Chad
> Brian Hammond wrote:
>> Hi David,
>> I've been working on a completely different project for the past  
>> few weeks.  I'm now getting back into this.
>>> The Python THttpServer implementation might be a good starting point
>>> for you in terms of the nuts and bolts of connecting your server to
>>> Thrift.  I would *not* recommend using it for production use (I use
>>> it as a mock backend for some integration tests) for performance  
>>> reasons.
>> Right, I wouldn't expect *one* of the THttpServer instances to  
>> perform well -- too much of a funnel.  However, this made me think  
>> that it might be worthwhile to load-balance a number of them.
>> I setup nginx with 4 worker processes (one per core) as a load  
>> balancer to 8 (arbitrary) python processes.  These upstream  
>> processes are -- at first stab (no Thrift yet) -- just running a  
>> BaseHTTPServer do_GET that returns "hello world".  Nginx simply  
>> does round-robin between the 8 upstream processes.
>> I figured this would be a good way to test if THttpServer would  
>> perform well enough for my purposes since  
>> THttpServer.RequestHandler is based on BaseHTTPServer.
>> Over loopback:
>> $ ab -n 20000 -c 1000
>> ...
>> Requests per second:    11644.32 [#/sec] (mean)
>> ...
>> From my laptop here in NY to my server in The Planet (Dallas, TX):
>> $ ab -n 20000 -c 1000 MY-HOSTNAME/index.html
>> ...
>> Requests per second:    788.20 [#/sec] (mean)
>> ...
>> I'm pretty happy with these numbers but of course the upstream  
>> processes do nothing interesting.  My data-store is redis [1]  
>> however which is extremely efficient given its nature (an in-memory  
>> key-value "database").  Thus, I don't expect much overhead from  
>> thrift or redis.  But, I'll test this assumption of course.
>> Sorry if this is obvious to a lot of you on this list.  This might  
>> be useful to others getting started.
>> Does anyone see any huge glaring problem with the idea of putting  
>> fast nginx in front of a number of "slow" THttpServer-based  
>> processes?
>> Thanks,
>> Brian
>> On Apr 3, 2009, at 12:59 AM, David Reiss wrote:
>>> The Python THttpServer implementation might be a good starting point
>>> for you in terms of the nuts and bolts of connecting your server to
>>> Thrift.  I would *not* recommend using it for production use (I use
>>> it as a mock backend for some integration tests) for performance  
>>> reasons.
>>> In order to avoid having a Thrift thread blocked on
>>> over-the-net-to-a-poorly-connected-client I/O, I would suggest using
>>> a server that will buffer up the whole request, then hand it to  
>>> Thrift,
>>> then buffer up the Thrift response, then, send the response to the  
>>> client.
>>> You probably want to put the POST data in a TMemoryBuffer (not a
>>> TBufferedTransport, which uses a fixed-size buffer).
>>> --David
>>> Brian Hammond wrote:
>>>> HI Garrett,
>>>> On Apr 2, 2009, at 11:26 PM, Garrett Smith wrote:
>>>>> ----- "Brian Hammond" <> wrote:
>>>>>> What I'm curious about is how I can do all of the following:
>>>>>> 1) use SSL to encrypt user credentials
>>>>>> 2) write my service implementation in python
>>>>>> I guess there's a few options for python but none completely  
>>>>>> solve
>>>>>> both of these requirements.
>>>>>> 1) use the Twisted python generator and run a daemon with twistd
>>>>>> 2) deploy to nginx/apache with mod_wsgi and somehow hook-in  
>>>>>> support
>>>>>> for decoding HTTP / HTTPS requests as Thrift RPCs.
>>>>> Unless you need an asynchronous server side framework for high
>>>>> concurrency and low memory footprint, I would stay clear of  
>>>>> Twisted.
>>>> It turns out that I need a highly efficient server.  I'm a one-man
>>>> shop and am limited in the number of servers I can afford to  
>>>> deploy.
>>>> I plan on starting with a bare minimum of two load-balanced VPS
>>>> instances so memory is tight.  I do also need high concurrency.   
>>>> I'm
>>>> developing a turn-based game server and have a very large user base
>>>> already (iPhone app) and would like to license my solution to other
>>>> similar iPhone developers ... of course I can enlarge my cluster of
>>>> servers linearly with the number of licensees.  I digress...
>>>>> I think a standard threaded wsgi server would work fine.
>>>> Suggestions?  CherryPy?
>>>>> If you're inclined to use a mod_wsgi, I recommend Graham  
>>>>> Dumpleton's
>>>>> outstanding wsgi implementation for Apache. The Nginx wsgi  
>>>>> interface
>>>>> is good as well, but beware if your app needs to block -- you'll  
>>>>> be
>>>>> serializing your requests.
>>>> True.  Nginx is indeed single-threaded.  I'm not leaning in any  
>>>> way to
>>>> any particular serving tech. at this point actually.  I just want  
>>>> to
>>>> ensure that whatever tech. I choose is as efficient as possible.
>>>> I actually don't have any points of blocking in the front-end
>>>> actually, not on disk I/O at least.  My datastore is a file- 
>>>> backed key-
>>>> value database that runs in a separate process and writes to disk  
>>>> on
>>>> every Nth database modification.
>>>>> Both options would let you run SSL as well as handle basic or  
>>>>> digest
>>>>> auth.
>>>> True.
>>>>> As far as tying in Thrift, I haven't done this myself and
>>>>> unfortunately can't offer much. Hopefully there are others here  
>>>>> who
>>>>> can. As you've already suggested, taking a look at the RPC layer  
>>>>> and
>>>>> seeing how you can tie it into the backend from wsgi is a start.
>>>> Yeah, that's what I gather.  I'll play with it over the weekend.
>>>>> IMO, the lack of a security story for Thrift is a weakness. I'm  
>>>>> not
>>>>> sure what discussions there have been to address this. I started  
>>>>> to
>>>>> implement SSL support for Java and Python, but found I had to  
>>>>> modify
>>>>> a fair amount of Thrift code and ended up punting by using  
>>>>> stunnel to
>>>>> setup a secure connection between client and server. You might  
>>>>> find
>>>>> this the path of least resistance as well, in particular if you  
>>>>> can
>>>>> add
>>>>> the authentication layer to your Thrift IDL.
>>>> Yeah, built-in SSL support would be nice.
>>>> My client will be running on an iPhone -- no stunnel.  Oh, yeah, I
>>>> should mention that it seems most people use Thrift for talking  
>>>> from
>>>> say their web server to *internal* web services but I'm planning on
>>>> using it as a public-facing web service, like the EverNote folks  
>>>> are.
>>>> It was actually good to see another instance of someone planning on
>>>> using Thrift this way.
>>>>> As one other approach, you can use a symmetric key to sign a  
>>>>> request
>>>>> and send the signature in the clear with the rest of your thrift  
>>>>> data.
>>>>> As long as you keep the signing key secret, this would let you
>>>>> validate
>>>>> the origin and integrity of the request. If there's anything  
>>>>> sensitive
>>>>> in the request itself, though, this is no good.
>>>> Right.  I cannot really trust the client -- iPhone apps are getting
>>>> cracked left and right.  Once cracked, someone will poke around  
>>>> enough
>>>> in the binary to find out my secret symmetric key even if not  
>>>> stored
>>>> as a literal string.
>>>> Thus, I want to use SSL for anything sensitive.
>>>> I'll create the equivalent of an auth token (same idea as login
>>>> cookies) with opaque data encrypted using a symmetric key only
>>>> available on the service-side.  The client will send back the auth
>>>> token with each Thrift RPC.  There's a lot more to this to fight
>>>> replay attacks, client spoofing, etc. but that isn't relevant here.
>>>> I need to be able to register a user account from the client (I  
>>>> know,
>>>> spammers will try to automate that but I have countermeasures) and
>>>> login the user as well.  This requires sending the sensitive user
>>>> information which, while essentially obfuscated to eavesdroppers by
>>>> virtue of using a binary protocol, can be reverse engineered easily
>>>> enough I bet.
>>>>> Alas, message signing is another application layer measure -- it  
>>>>> would
>>>>> be sweet to see auth work its way into the Thrift spec.
>>>> Yeah, I'm planning on requiring signatures ala Amazon Web Services.
>>>> Some data used in the request signature calculation will only be
>>>> available to the client and the service and never transmitted  
>>>> between
>>>> them in the clear -- it would be transmitted to the client during a
>>>> login over HTTPS.
>>>> Auth in Thrift would be wonderful but I wonder if that's feature  
>>>> creep?
>>>>> Good luck!
>>>>> Garrett
>>>> Thanks!
>>>> Brian

View raw message