thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chad Maine <chad.ma...@gmail.com>
Subject Re: Python server over HTTP, HTTPS -- How?
Date Fri, 24 Apr 2009 12:06:01 GMT
Brian, the past few days I've been looking at serving thrift services 
with Python over HTTP.  I blogged about dropping an example thrift 
service into a Django app here: 
http://www.redmoxie.net/blog/2009/apr/15/django-and-thrift-part-1/.  I 
also experimented with running a thrift service as a very thin WSGI app 
w/ SQLAlchemy doing the data access under Apache and mod_wsgi 
(http://code.google.com/p/modwsgi/).  That performed extremely well for 
me and might be a lot closer to what you are looking for.  You can then 
tune your preferred Apache MPM for threading or prefork or whatever 
works best for you.

Chad

Brian Hammond wrote:
> Hi David,
>
> I've been working on a completely different project for the past few 
> weeks.  I'm now getting back into this.
>
>> The Python THttpServer implementation might be a good starting point
>> for you in terms of the nuts and bolts of connecting your server to
>> Thrift.  I would *not* recommend using it for production use (I use
>> it as a mock backend for some integration tests) for performance 
>> reasons.
>
>
> Right, I wouldn't expect *one* of the THttpServer instances to perform 
> well -- too much of a funnel.  However, this made me think that it 
> might be worthwhile to load-balance a number of them.
>
> I setup nginx with 4 worker processes (one per core) as a load 
> balancer to 8 (arbitrary) python processes.  These upstream processes 
> are -- at first stab (no Thrift yet) -- just running a BaseHTTPServer 
> do_GET that returns "hello world".  Nginx simply does round-robin 
> between the 8 upstream processes.
>
> I figured this would be a good way to test if THttpServer would 
> perform well enough for my purposes since THttpServer.RequestHandler 
> is based on BaseHTTPServer.
>
> Over loopback:
>
> $ ab -n 20000 -c 1000 127.0.0.1/index.html
>
> ...
> Requests per second:    11644.32 [#/sec] (mean)
> ...
>
> From my laptop here in NY to my server in The Planet (Dallas, TX):
>
> $ ab -n 20000 -c 1000 MY-HOSTNAME/index.html
>
> ...
> Requests per second:    788.20 [#/sec] (mean)
> ...
>
> I'm pretty happy with these numbers but of course the upstream 
> processes do nothing interesting.  My data-store is redis [1] however 
> which is extremely efficient given its nature (an in-memory key-value 
> "database").  Thus, I don't expect much overhead from thrift or 
> redis.  But, I'll test this assumption of course.
>
> Sorry if this is obvious to a lot of you on this list.  This might be 
> useful to others getting started.
>
> Does anyone see any huge glaring problem with the idea of putting fast 
> nginx in front of a number of "slow" THttpServer-based processes?
>
> Thanks,
> Brian
>
> On Apr 3, 2009, at 12:59 AM, David Reiss wrote:
>
>> http://gitweb.thrift-rpc.org/?p=thrift.git;a=blob;f=lib/py/src/server/THttpServer.py;h=21fc314;hb=7534e71

>>
>>
>> The Python THttpServer implementation might be a good starting point
>> for you in terms of the nuts and bolts of connecting your server to
>> Thrift.  I would *not* recommend using it for production use (I use
>> it as a mock backend for some integration tests) for performance 
>> reasons.
>> In order to avoid having a Thrift thread blocked on
>> over-the-net-to-a-poorly-connected-client I/O, I would suggest using
>> a server that will buffer up the whole request, then hand it to Thrift,
>> then buffer up the Thrift response, then, send the response to the 
>> client.
>> You probably want to put the POST data in a TMemoryBuffer (not a
>> TBufferedTransport, which uses a fixed-size buffer).
>>
>> --David
>>
>> Brian Hammond wrote:
>>> HI Garrett,
>>>
>>> On Apr 2, 2009, at 11:26 PM, Garrett Smith wrote:
>>>
>>>> ----- "Brian Hammond" <brian@brianhammond.com> wrote:
>>>>> What I'm curious about is how I can do all of the following:
>>>>>
>>>>> 1) use SSL to encrypt user credentials
>>>>> 2) write my service implementation in python
>>>>>
>>>>> I guess there's a few options for python but none completely solve
>>>>> both of these requirements.
>>>>>
>>>>> 1) use the Twisted python generator and run a daemon with twistd
>>>>> 2) deploy to nginx/apache with mod_wsgi and somehow hook-in support
>>>>> for decoding HTTP / HTTPS requests as Thrift RPCs.
>>>> Unless you need an asynchronous server side framework for high
>>>> concurrency and low memory footprint, I would stay clear of Twisted.
>>>
>>> It turns out that I need a highly efficient server.  I'm a one-man
>>> shop and am limited in the number of servers I can afford to deploy.
>>> I plan on starting with a bare minimum of two load-balanced VPS
>>> instances so memory is tight.  I do also need high concurrency.  I'm
>>> developing a turn-based game server and have a very large user base
>>> already (iPhone app) and would like to license my solution to other
>>> similar iPhone developers ... of course I can enlarge my cluster of
>>> servers linearly with the number of licensees.  I digress...
>>>
>>>> I think a standard threaded wsgi server would work fine.
>>>
>>> Suggestions?  CherryPy?
>>>
>>>> If you're inclined to use a mod_wsgi, I recommend Graham Dumpleton's
>>>> outstanding wsgi implementation for Apache. The Nginx wsgi interface
>>>> is good as well, but beware if your app needs to block -- you'll be
>>>> serializing your requests.
>>>
>>> True.  Nginx is indeed single-threaded.  I'm not leaning in any way to
>>> any particular serving tech. at this point actually.  I just want to
>>> ensure that whatever tech. I choose is as efficient as possible.
>>>
>>> I actually don't have any points of blocking in the front-end
>>> actually, not on disk I/O at least.  My datastore is a file-backed key-
>>> value database that runs in a separate process and writes to disk on
>>> every Nth database modification.
>>>
>>>> Both options would let you run SSL as well as handle basic or digest
>>>> auth.
>>>
>>> True.
>>>
>>>> As far as tying in Thrift, I haven't done this myself and
>>>> unfortunately can't offer much. Hopefully there are others here who
>>>> can. As you've already suggested, taking a look at the RPC layer and
>>>> seeing how you can tie it into the backend from wsgi is a start.
>>>
>>> Yeah, that's what I gather.  I'll play with it over the weekend.
>>>
>>>> IMO, the lack of a security story for Thrift is a weakness. I'm not
>>>> sure what discussions there have been to address this. I started to
>>>> implement SSL support for Java and Python, but found I had to modify
>>>> a fair amount of Thrift code and ended up punting by using stunnel to
>>>> setup a secure connection between client and server. You might find
>>>> this the path of least resistance as well, in particular if you can
>>>> add
>>>> the authentication layer to your Thrift IDL.
>>>
>>> Yeah, built-in SSL support would be nice.
>>>
>>> My client will be running on an iPhone -- no stunnel.  Oh, yeah, I
>>> should mention that it seems most people use Thrift for talking from
>>> say their web server to *internal* web services but I'm planning on
>>> using it as a public-facing web service, like the EverNote folks are.
>>> It was actually good to see another instance of someone planning on
>>> using Thrift this way.
>>>
>>>> As one other approach, you can use a symmetric key to sign a request
>>>> and send the signature in the clear with the rest of your thrift data.
>>>> As long as you keep the signing key secret, this would let you
>>>> validate
>>>> the origin and integrity of the request. If there's anything sensitive
>>>> in the request itself, though, this is no good.
>>>
>>> Right.  I cannot really trust the client -- iPhone apps are getting
>>> cracked left and right.  Once cracked, someone will poke around enough
>>> in the binary to find out my secret symmetric key even if not stored
>>> as a literal string.
>>>
>>> Thus, I want to use SSL for anything sensitive.
>>>
>>> I'll create the equivalent of an auth token (same idea as login
>>> cookies) with opaque data encrypted using a symmetric key only
>>> available on the service-side.  The client will send back the auth
>>> token with each Thrift RPC.  There's a lot more to this to fight
>>> replay attacks, client spoofing, etc. but that isn't relevant here.
>>>
>>> I need to be able to register a user account from the client (I know,
>>> spammers will try to automate that but I have countermeasures) and
>>> login the user as well.  This requires sending the sensitive user
>>> information which, while essentially obfuscated to eavesdroppers by
>>> virtue of using a binary protocol, can be reverse engineered easily
>>> enough I bet.
>>>
>>>> Alas, message signing is another application layer measure -- it would
>>>> be sweet to see auth work its way into the Thrift spec.
>>>
>>> Yeah, I'm planning on requiring signatures ala Amazon Web Services.
>>> Some data used in the request signature calculation will only be
>>> available to the client and the service and never transmitted between
>>> them in the clear -- it would be transmitted to the client during a
>>> login over HTTPS.
>>>
>>> Auth in Thrift would be wonderful but I wonder if that's feature creep?
>>>
>>>> Good luck!
>>>>
>>>> Garrett
>>>
>>> Thanks!
>>> Brian
>>>
>


Mime
View raw message