thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Hammond <br...@brianhammond.com>
Subject Re: Python server over HTTP, HTTPS -- How?
Date Fri, 24 Apr 2009 19:24:25 GMT
I wonder if thread-per-connection concerns are still valid, 3 years  
later?
http://bob.pythonmac.org/archives/2006/09/13/nginx-reverse-proxy-panacea/

In any event, I'm pretty thrilled with nginx so far.

The point of my earlier email was to ask if there was any real concern  
with using the python THttpServer behind *any* load balancer, in  
production.

Thanks to everyone for the input!
Brian

On Apr 24, 2009, at 3:10 PM, Chad Maine wrote:

> You might want to consider Pound (http://www.apsis.ch/pound/) as  
> well.  I've
> been using it for years to load balance HTTP services.
>
> On Fri, Apr 24, 2009 at 1:49 PM, Brian Hammond  
> <brian@brianhammond.com>wrote:
>
>> Hi,
>>
>> HAProxy looks good. The problem I have with it is that it doesn't  
>> support
>> SSL. Some of my thrift requests must go over SSL (e.g.
>> login/logout/update-profile). Thus, if I used HAProxy I'd need to
>> incorporate a "token server" or "auth server" that uses SSL, and at  
>> that
>> point I may as well stick to nginx.
>>
>> Thanks,
>> Brian
>>
>>
>> On Apr 24, 2009, at 3:46 AM, David Balatero wrote:
>>
>> I don't see why not -- having a fast proxy seems like the best  
>> thing to do
>>> given 8 slower instances behind it. Also, you might look into  
>>> HAProxy, as
>>> I
>>> hear it does arbitrary TCP load-balancing as well as specific HTTP
>>> balancing.
>>>
>>> On Thu, Apr 23, 2009 at 9:01 PM, Brian Hammond <brian@brianhammond.com
>>>> wrote:
>>>
>>> Hi David,
>>>>
>>>> I've been working on a completely different project for the past  
>>>> few
>>>> weeks.
>>>> I'm now getting back into this.
>>>>
>>>> The Python THttpServer implementation might be a good starting  
>>>> point
>>>>
>>>>> for you in terms of the nuts and bolts of connecting your server  
>>>>> to
>>>>> Thrift.  I would *not* recommend using it for production use (I  
>>>>> use
>>>>> it as a mock backend for some integration tests) for performance
>>>>> reasons.
>>>>>
>>>>>
>>>>
>>>> Right, I wouldn't expect *one* of the THttpServer instances to  
>>>> perform
>>>> well
>>>> -- too much of a funnel.  However, this made me think that it  
>>>> might be
>>>> worthwhile to load-balance a number of them.
>>>>
>>>> I setup nginx with 4 worker processes (one per core) as a load  
>>>> balancer
>>>> to
>>>> 8 (arbitrary) python processes.  These upstream processes are --  
>>>> at first
>>>> stab (no Thrift yet) -- just running a BaseHTTPServer do_GET that  
>>>> returns
>>>> "hello world".  Nginx simply does round-robin between the 8  
>>>> upstream
>>>> processes.
>>>>
>>>> I figured this would be a good way to test if THttpServer would  
>>>> perform
>>>> well enough for my purposes since THttpServer.RequestHandler is  
>>>> based on
>>>> BaseHTTPServer.
>>>>
>>>> Over loopback:
>>>>
>>>> $ ab -n 20000 -c 1000 127.0.0.1/index.html
>>>>
>>>> ...
>>>> Requests per second:    11644.32 [#/sec] (mean)
>>>> ...
>>>>
>>>> From my laptop here in NY to my server in The Planet (Dallas, TX):
>>>>
>>>> $ ab -n 20000 -c 1000 MY-HOSTNAME/index.html
>>>>
>>>> ...
>>>> Requests per second:    788.20 [#/sec] (mean)
>>>> ...
>>>>
>>>> I'm pretty happy with these numbers but of course the upstream  
>>>> processes
>>>> do
>>>> nothing interesting.  My data-store is redis [1] however which is
>>>> extremely
>>>> efficient given its nature (an in-memory key-value "database").   
>>>> Thus, I
>>>> don't expect much overhead from thrift or redis.  But, I'll test  
>>>> this
>>>> assumption of course.
>>>>
>>>> Sorry if this is obvious to a lot of you on this list.  This  
>>>> might be
>>>> useful to others getting started.
>>>>
>>>> Does anyone see any huge glaring problem with the idea of putting  
>>>> fast
>>>> nginx in front of a number of "slow" THttpServer-based processes?
>>>>
>>>> Thanks,
>>>> Brian
>>>>
>>>> On Apr 3, 2009, at 12:59 AM, David Reiss wrote:
>>>>
>>>>
>>>>
>>>>> http://gitweb.thrift-rpc.org/?p=thrift.git;a=blob;f=lib/py/src/server/THttpServer.py;h=21fc314;hb=7534e71
>>>>>
>>>>> The Python THttpServer implementation might be a good starting  
>>>>> point
>>>>> for you in terms of the nuts and bolts of connecting your server  
>>>>> to
>>>>> Thrift.  I would *not* recommend using it for production use (I  
>>>>> use
>>>>> it as a mock backend for some integration tests) for performance
>>>>> reasons.
>>>>> In order to avoid having a Thrift thread blocked on
>>>>> over-the-net-to-a-poorly-connected-client I/O, I would suggest  
>>>>> using
>>>>> a server that will buffer up the whole request, then hand it to  
>>>>> Thrift,
>>>>> then buffer up the Thrift response, then, send the response to the
>>>>> client.
>>>>> You probably want to put the POST data in a TMemoryBuffer (not a
>>>>> TBufferedTransport, which uses a fixed-size buffer).
>>>>>
>>>>> --David
>>>>>
>>>>> Brian Hammond wrote:
>>>>>
>>>>> HI Garrett,
>>>>>>
>>>>>> On Apr 2, 2009, at 11:26 PM, Garrett Smith wrote:
>>>>>>
>>>>>> ----- "Brian Hammond" <brian@brianhammond.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> What I'm curious about is how I can do all of the following:
>>>>>>>>
>>>>>>>> 1) use SSL to encrypt user credentials
>>>>>>>> 2) write my service implementation in python
>>>>>>>>
>>>>>>>> I guess there's a few options for python but none completely
 
>>>>>>>> solve
>>>>>>>> both of these requirements.
>>>>>>>>
>>>>>>>> 1) use the Twisted python generator and run a daemon with
 
>>>>>>>> twistd
>>>>>>>> 2) deploy to nginx/apache with mod_wsgi and somehow hook-in
 
>>>>>>>> support
>>>>>>>> for decoding HTTP / HTTPS requests as Thrift RPCs.
>>>>>>>>
>>>>>>>> Unless you need an asynchronous server side framework for
high
>>>>>>> concurrency and low memory footprint, I would stay clear of 

>>>>>>> Twisted.
>>>>>>>
>>>>>>>
>>>>>> It turns out that I need a highly efficient server.  I'm a one- 
>>>>>> man
>>>>>> shop and am limited in the number of servers I can afford to  
>>>>>> deploy.
>>>>>> I plan on starting with a bare minimum of two load-balanced VPS
>>>>>> instances so memory is tight.  I do also need high  
>>>>>> concurrency.  I'm
>>>>>> developing a turn-based game server and have a very large user  
>>>>>> base
>>>>>> already (iPhone app) and would like to license my solution to  
>>>>>> other
>>>>>> similar iPhone developers ... of course I can enlarge my  
>>>>>> cluster of
>>>>>> servers linearly with the number of licensees.  I digress...
>>>>>>
>>>>>> I think a standard threaded wsgi server would work fine.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Suggestions?  CherryPy?
>>>>>>
>>>>>> If you're inclined to use a mod_wsgi, I recommend Graham  
>>>>>> Dumpleton's
>>>>>>
>>>>>>> outstanding wsgi implementation for Apache. The Nginx wsgi  
>>>>>>> interface
>>>>>>> is good as well, but beware if your app needs to block --  
>>>>>>> you'll be
>>>>>>> serializing your requests.
>>>>>>>
>>>>>>>
>>>>>> True.  Nginx is indeed single-threaded.  I'm not leaning in any 

>>>>>> way to
>>>>>> any particular serving tech. at this point actually.  I just  
>>>>>> want to
>>>>>> ensure that whatever tech. I choose is as efficient as possible.
>>>>>>
>>>>>> I actually don't have any points of blocking in the front-end
>>>>>> actually, not on disk I/O at least.  My datastore is a file- 
>>>>>> backed key-
>>>>>> value database that runs in a separate process and writes to  
>>>>>> disk on
>>>>>> every Nth database modification.
>>>>>>
>>>>>> Both options would let you run SSL as well as handle basic or  
>>>>>> digest
>>>>>>
>>>>>>> auth.
>>>>>>>
>>>>>>>
>>>>>> True.
>>>>>>
>>>>>> As far as tying in Thrift, I haven't done this myself and
>>>>>>
>>>>>>> unfortunately can't offer much. Hopefully there are others  
>>>>>>> here who
>>>>>>> can. As you've already suggested, taking a look at the RPC  
>>>>>>> layer and
>>>>>>> seeing how you can tie it into the backend from wsgi is a start.
>>>>>>>
>>>>>>>
>>>>>> Yeah, that's what I gather.  I'll play with it over the weekend.
>>>>>>
>>>>>> IMO, the lack of a security story for Thrift is a weakness. I'm 

>>>>>> not
>>>>>>
>>>>>>> sure what discussions there have been to address this. I  
>>>>>>> started to
>>>>>>> implement SSL support for Java and Python, but found I had to
 
>>>>>>> modify
>>>>>>> a fair amount of Thrift code and ended up punting by using  
>>>>>>> stunnel to
>>>>>>> setup a secure connection between client and server. You might
 
>>>>>>> find
>>>>>>> this the path of least resistance as well, in particular if 

>>>>>>> you can
>>>>>>> add
>>>>>>> the authentication layer to your Thrift IDL.
>>>>>>>
>>>>>>>
>>>>>> Yeah, built-in SSL support would be nice.
>>>>>>
>>>>>> My client will be running on an iPhone -- no stunnel.  Oh,  
>>>>>> yeah, I
>>>>>> should mention that it seems most people use Thrift for talking 

>>>>>> from
>>>>>> say their web server to *internal* web services but I'm  
>>>>>> planning on
>>>>>> using it as a public-facing web service, like the EverNote  
>>>>>> folks are.
>>>>>> It was actually good to see another instance of someone  
>>>>>> planning on
>>>>>> using Thrift this way.
>>>>>>
>>>>>> As one other approach, you can use a symmetric key to sign a  
>>>>>> request
>>>>>>
>>>>>>> and send the signature in the clear with the rest of your  
>>>>>>> thrift data.
>>>>>>> As long as you keep the signing key secret, this would let you
>>>>>>> validate
>>>>>>> the origin and integrity of the request. If there's anything
 
>>>>>>> sensitive
>>>>>>> in the request itself, though, this is no good.
>>>>>>>
>>>>>>>
>>>>>> Right.  I cannot really trust the client -- iPhone apps are  
>>>>>> getting
>>>>>> cracked left and right.  Once cracked, someone will poke around 

>>>>>> enough
>>>>>> in the binary to find out my secret symmetric key even if not  
>>>>>> stored
>>>>>> as a literal string.
>>>>>>
>>>>>> Thus, I want to use SSL for anything sensitive.
>>>>>>
>>>>>> I'll create the equivalent of an auth token (same idea as login
>>>>>> cookies) with opaque data encrypted using a symmetric key only
>>>>>> available on the service-side.  The client will send back the  
>>>>>> auth
>>>>>> token with each Thrift RPC.  There's a lot more to this to fight
>>>>>> replay attacks, client spoofing, etc. but that isn't relevant  
>>>>>> here.
>>>>>>
>>>>>> I need to be able to register a user account from the client (I 

>>>>>> know,
>>>>>> spammers will try to automate that but I have countermeasures)  
>>>>>> and
>>>>>> login the user as well.  This requires sending the sensitive user
>>>>>> information which, while essentially obfuscated to  
>>>>>> eavesdroppers by
>>>>>> virtue of using a binary protocol, can be reverse engineered  
>>>>>> easily
>>>>>> enough I bet.
>>>>>>
>>>>>> Alas, message signing is another application layer measure --  
>>>>>> it would
>>>>>>
>>>>>>> be sweet to see auth work its way into the Thrift spec.
>>>>>>>
>>>>>>>
>>>>>> Yeah, I'm planning on requiring signatures ala Amazon Web  
>>>>>> Services.
>>>>>> Some data used in the request signature calculation will only be
>>>>>> available to the client and the service and never transmitted  
>>>>>> between
>>>>>> them in the clear -- it would be transmitted to the client  
>>>>>> during a
>>>>>> login over HTTPS.
>>>>>>
>>>>>> Auth in Thrift would be wonderful but I wonder if that's  
>>>>>> feature creep?
>>>>>>
>>>>>> Good luck!
>>>>>>
>>>>>>>
>>>>>>> Garrett
>>>>>>>
>>>>>>>
>>>>>> Thanks!
>>>>>> Brian
>>>>>>
>>>>>>
>>>>>>
>>>>
>>


Mime
View raw message