thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Hammond <br...@brianhammond.com>
Subject Re: Python server over HTTP, HTTPS -- How?
Date Fri, 24 Apr 2009 17:49:57 GMT
Hi,

HAProxy looks good. The problem I have with it is that it doesn't  
support SSL. Some of my thrift requests must go over SSL (e.g. login/ 
logout/update-profile). Thus, if I used HAProxy I'd need to  
incorporate a "token server" or "auth server" that uses SSL, and at  
that point I may as well stick to nginx.

Thanks,
Brian

On Apr 24, 2009, at 3:46 AM, David Balatero wrote:

> I don't see why not -- having a fast proxy seems like the best thing  
> to do
> given 8 slower instances behind it. Also, you might look into  
> HAProxy, as I
> hear it does arbitrary TCP load-balancing as well as specific HTTP
> balancing.
>
> On Thu, Apr 23, 2009 at 9:01 PM, Brian Hammond  
> <brian@brianhammond.com>wrote:
>
>> Hi David,
>>
>> I've been working on a completely different project for the past  
>> few weeks.
>> I'm now getting back into this.
>>
>> The Python THttpServer implementation might be a good starting point
>>> for you in terms of the nuts and bolts of connecting your server to
>>> Thrift.  I would *not* recommend using it for production use (I use
>>> it as a mock backend for some integration tests) for performance  
>>> reasons.
>>>
>>
>>
>> Right, I wouldn't expect *one* of the THttpServer instances to  
>> perform well
>> -- too much of a funnel.  However, this made me think that it might  
>> be
>> worthwhile to load-balance a number of them.
>>
>> I setup nginx with 4 worker processes (one per core) as a load  
>> balancer to
>> 8 (arbitrary) python processes.  These upstream processes are -- at  
>> first
>> stab (no Thrift yet) -- just running a BaseHTTPServer do_GET that  
>> returns
>> "hello world".  Nginx simply does round-robin between the 8 upstream
>> processes.
>>
>> I figured this would be a good way to test if THttpServer would  
>> perform
>> well enough for my purposes since THttpServer.RequestHandler is  
>> based on
>> BaseHTTPServer.
>>
>> Over loopback:
>>
>> $ ab -n 20000 -c 1000 127.0.0.1/index.html
>>
>> ...
>> Requests per second:    11644.32 [#/sec] (mean)
>> ...
>>
>> From my laptop here in NY to my server in The Planet (Dallas, TX):
>>
>> $ ab -n 20000 -c 1000 MY-HOSTNAME/index.html
>>
>> ...
>> Requests per second:    788.20 [#/sec] (mean)
>> ...
>>
>> I'm pretty happy with these numbers but of course the upstream  
>> processes do
>> nothing interesting.  My data-store is redis [1] however which is  
>> extremely
>> efficient given its nature (an in-memory key-value "database").   
>> Thus, I
>> don't expect much overhead from thrift or redis.  But, I'll test this
>> assumption of course.
>>
>> Sorry if this is obvious to a lot of you on this list.  This might be
>> useful to others getting started.
>>
>> Does anyone see any huge glaring problem with the idea of putting  
>> fast
>> nginx in front of a number of "slow" THttpServer-based processes?
>>
>> Thanks,
>> Brian
>>
>> On Apr 3, 2009, at 12:59 AM, David Reiss wrote:
>>
>>
>>> http://gitweb.thrift-rpc.org/?p=thrift.git;a=blob;f=lib/py/src/server/THttpServer.py;h=21fc314;hb=7534e71
>>>
>>> The Python THttpServer implementation might be a good starting point
>>> for you in terms of the nuts and bolts of connecting your server to
>>> Thrift.  I would *not* recommend using it for production use (I use
>>> it as a mock backend for some integration tests) for performance  
>>> reasons.
>>> In order to avoid having a Thrift thread blocked on
>>> over-the-net-to-a-poorly-connected-client I/O, I would suggest using
>>> a server that will buffer up the whole request, then hand it to  
>>> Thrift,
>>> then buffer up the Thrift response, then, send the response to the  
>>> client.
>>> You probably want to put the POST data in a TMemoryBuffer (not a
>>> TBufferedTransport, which uses a fixed-size buffer).
>>>
>>> --David
>>>
>>> Brian Hammond wrote:
>>>
>>>> HI Garrett,
>>>>
>>>> On Apr 2, 2009, at 11:26 PM, Garrett Smith wrote:
>>>>
>>>> ----- "Brian Hammond" <brian@brianhammond.com> wrote:
>>>>>
>>>>>> What I'm curious about is how I can do all of the following:
>>>>>>
>>>>>> 1) use SSL to encrypt user credentials
>>>>>> 2) write my service implementation in python
>>>>>>
>>>>>> I guess there's a few options for python but none completely  
>>>>>> solve
>>>>>> both of these requirements.
>>>>>>
>>>>>> 1) use the Twisted python generator and run a daemon with twistd
>>>>>> 2) deploy to nginx/apache with mod_wsgi and somehow hook-in  
>>>>>> support
>>>>>> for decoding HTTP / HTTPS requests as Thrift RPCs.
>>>>>>
>>>>> Unless you need an asynchronous server side framework for high
>>>>> concurrency and low memory footprint, I would stay clear of  
>>>>> Twisted.
>>>>>
>>>>
>>>> It turns out that I need a highly efficient server.  I'm a one-man
>>>> shop and am limited in the number of servers I can afford to  
>>>> deploy.
>>>> I plan on starting with a bare minimum of two load-balanced VPS
>>>> instances so memory is tight.  I do also need high concurrency.   
>>>> I'm
>>>> developing a turn-based game server and have a very large user base
>>>> already (iPhone app) and would like to license my solution to other
>>>> similar iPhone developers ... of course I can enlarge my cluster of
>>>> servers linearly with the number of licensees.  I digress...
>>>>
>>>> I think a standard threaded wsgi server would work fine.
>>>>>
>>>>
>>>> Suggestions?  CherryPy?
>>>>
>>>> If you're inclined to use a mod_wsgi, I recommend Graham  
>>>> Dumpleton's
>>>>> outstanding wsgi implementation for Apache. The Nginx wsgi  
>>>>> interface
>>>>> is good as well, but beware if your app needs to block -- you'll  
>>>>> be
>>>>> serializing your requests.
>>>>>
>>>>
>>>> True.  Nginx is indeed single-threaded.  I'm not leaning in any  
>>>> way to
>>>> any particular serving tech. at this point actually.  I just want  
>>>> to
>>>> ensure that whatever tech. I choose is as efficient as possible.
>>>>
>>>> I actually don't have any points of blocking in the front-end
>>>> actually, not on disk I/O at least.  My datastore is a file- 
>>>> backed key-
>>>> value database that runs in a separate process and writes to disk  
>>>> on
>>>> every Nth database modification.
>>>>
>>>> Both options would let you run SSL as well as handle basic or  
>>>> digest
>>>>> auth.
>>>>>
>>>>
>>>> True.
>>>>
>>>> As far as tying in Thrift, I haven't done this myself and
>>>>> unfortunately can't offer much. Hopefully there are others here  
>>>>> who
>>>>> can. As you've already suggested, taking a look at the RPC layer  
>>>>> and
>>>>> seeing how you can tie it into the backend from wsgi is a start.
>>>>>
>>>>
>>>> Yeah, that's what I gather.  I'll play with it over the weekend.
>>>>
>>>> IMO, the lack of a security story for Thrift is a weakness. I'm not
>>>>> sure what discussions there have been to address this. I started  
>>>>> to
>>>>> implement SSL support for Java and Python, but found I had to  
>>>>> modify
>>>>> a fair amount of Thrift code and ended up punting by using  
>>>>> stunnel to
>>>>> setup a secure connection between client and server. You might  
>>>>> find
>>>>> this the path of least resistance as well, in particular if you  
>>>>> can
>>>>> add
>>>>> the authentication layer to your Thrift IDL.
>>>>>
>>>>
>>>> Yeah, built-in SSL support would be nice.
>>>>
>>>> My client will be running on an iPhone -- no stunnel.  Oh, yeah, I
>>>> should mention that it seems most people use Thrift for talking  
>>>> from
>>>> say their web server to *internal* web services but I'm planning on
>>>> using it as a public-facing web service, like the EverNote folks  
>>>> are.
>>>> It was actually good to see another instance of someone planning on
>>>> using Thrift this way.
>>>>
>>>> As one other approach, you can use a symmetric key to sign a  
>>>> request
>>>>> and send the signature in the clear with the rest of your thrift  
>>>>> data.
>>>>> As long as you keep the signing key secret, this would let you
>>>>> validate
>>>>> the origin and integrity of the request. If there's anything  
>>>>> sensitive
>>>>> in the request itself, though, this is no good.
>>>>>
>>>>
>>>> Right.  I cannot really trust the client -- iPhone apps are getting
>>>> cracked left and right.  Once cracked, someone will poke around  
>>>> enough
>>>> in the binary to find out my secret symmetric key even if not  
>>>> stored
>>>> as a literal string.
>>>>
>>>> Thus, I want to use SSL for anything sensitive.
>>>>
>>>> I'll create the equivalent of an auth token (same idea as login
>>>> cookies) with opaque data encrypted using a symmetric key only
>>>> available on the service-side.  The client will send back the auth
>>>> token with each Thrift RPC.  There's a lot more to this to fight
>>>> replay attacks, client spoofing, etc. but that isn't relevant here.
>>>>
>>>> I need to be able to register a user account from the client (I  
>>>> know,
>>>> spammers will try to automate that but I have countermeasures) and
>>>> login the user as well.  This requires sending the sensitive user
>>>> information which, while essentially obfuscated to eavesdroppers by
>>>> virtue of using a binary protocol, can be reverse engineered easily
>>>> enough I bet.
>>>>
>>>> Alas, message signing is another application layer measure -- it  
>>>> would
>>>>> be sweet to see auth work its way into the Thrift spec.
>>>>>
>>>>
>>>> Yeah, I'm planning on requiring signatures ala Amazon Web Services.
>>>> Some data used in the request signature calculation will only be
>>>> available to the client and the service and never transmitted  
>>>> between
>>>> them in the clear -- it would be transmitted to the client during a
>>>> login over HTTPS.
>>>>
>>>> Auth in Thrift would be wonderful but I wonder if that's feature  
>>>> creep?
>>>>
>>>> Good luck!
>>>>>
>>>>> Garrett
>>>>>
>>>>
>>>> Thanks!
>>>> Brian
>>>>
>>>>
>>


Mime
View raw message