thrift-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Hammond <br...@brianhammond.com>
Subject Re: Python server over HTTP, HTTPS -- How?
Date Fri, 24 Apr 2009 04:01:42 GMT
Hi David,

I've been working on a completely different project for the past few  
weeks.  I'm now getting back into this.

> The Python THttpServer implementation might be a good starting point
> for you in terms of the nuts and bolts of connecting your server to
> Thrift.  I would *not* recommend using it for production use (I use
> it as a mock backend for some integration tests) for performance  
> reasons.


Right, I wouldn't expect *one* of the THttpServer instances to perform  
well -- too much of a funnel.  However, this made me think that it  
might be worthwhile to load-balance a number of them.

I setup nginx with 4 worker processes (one per core) as a load  
balancer to 8 (arbitrary) python processes.  These upstream processes  
are -- at first stab (no Thrift yet) -- just running a BaseHTTPServer  
do_GET that returns "hello world".  Nginx simply does round-robin  
between the 8 upstream processes.

I figured this would be a good way to test if THttpServer would  
perform well enough for my purposes since THttpServer.RequestHandler  
is based on BaseHTTPServer.

Over loopback:

$ ab -n 20000 -c 1000 127.0.0.1/index.html

...
Requests per second:    11644.32 [#/sec] (mean)
...

 From my laptop here in NY to my server in The Planet (Dallas, TX):

$ ab -n 20000 -c 1000 MY-HOSTNAME/index.html

...
Requests per second:    788.20 [#/sec] (mean)
...

I'm pretty happy with these numbers but of course the upstream  
processes do nothing interesting.  My data-store is redis [1] however  
which is extremely efficient given its nature (an in-memory key-value  
"database").  Thus, I don't expect much overhead from thrift or  
redis.  But, I'll test this assumption of course.

Sorry if this is obvious to a lot of you on this list.  This might be  
useful to others getting started.

Does anyone see any huge glaring problem with the idea of putting fast  
nginx in front of a number of "slow" THttpServer-based processes?

Thanks,
Brian

On Apr 3, 2009, at 12:59 AM, David Reiss wrote:

> http://gitweb.thrift-rpc.org/?p=thrift.git;a=blob;f=lib/py/src/server/THttpServer.py;h=21fc314;hb=7534e71
>
> The Python THttpServer implementation might be a good starting point
> for you in terms of the nuts and bolts of connecting your server to
> Thrift.  I would *not* recommend using it for production use (I use
> it as a mock backend for some integration tests) for performance  
> reasons.
> In order to avoid having a Thrift thread blocked on
> over-the-net-to-a-poorly-connected-client I/O, I would suggest using
> a server that will buffer up the whole request, then hand it to  
> Thrift,
> then buffer up the Thrift response, then, send the response to the  
> client.
> You probably want to put the POST data in a TMemoryBuffer (not a
> TBufferedTransport, which uses a fixed-size buffer).
>
> --David
>
> Brian Hammond wrote:
>> HI Garrett,
>>
>> On Apr 2, 2009, at 11:26 PM, Garrett Smith wrote:
>>
>>> ----- "Brian Hammond" <brian@brianhammond.com> wrote:
>>>> What I'm curious about is how I can do all of the following:
>>>>
>>>> 1) use SSL to encrypt user credentials
>>>> 2) write my service implementation in python
>>>>
>>>> I guess there's a few options for python but none completely solve
>>>> both of these requirements.
>>>>
>>>> 1) use the Twisted python generator and run a daemon with twistd
>>>> 2) deploy to nginx/apache with mod_wsgi and somehow hook-in support
>>>> for decoding HTTP / HTTPS requests as Thrift RPCs.
>>> Unless you need an asynchronous server side framework for high
>>> concurrency and low memory footprint, I would stay clear of Twisted.
>>
>> It turns out that I need a highly efficient server.  I'm a one-man
>> shop and am limited in the number of servers I can afford to deploy.
>> I plan on starting with a bare minimum of two load-balanced VPS
>> instances so memory is tight.  I do also need high concurrency.  I'm
>> developing a turn-based game server and have a very large user base
>> already (iPhone app) and would like to license my solution to other
>> similar iPhone developers ... of course I can enlarge my cluster of
>> servers linearly with the number of licensees.  I digress...
>>
>>> I think a standard threaded wsgi server would work fine.
>>
>> Suggestions?  CherryPy?
>>
>>> If you're inclined to use a mod_wsgi, I recommend Graham Dumpleton's
>>> outstanding wsgi implementation for Apache. The Nginx wsgi interface
>>> is good as well, but beware if your app needs to block -- you'll be
>>> serializing your requests.
>>
>> True.  Nginx is indeed single-threaded.  I'm not leaning in any way  
>> to
>> any particular serving tech. at this point actually.  I just want to
>> ensure that whatever tech. I choose is as efficient as possible.
>>
>> I actually don't have any points of blocking in the front-end
>> actually, not on disk I/O at least.  My datastore is a file-backed  
>> key-
>> value database that runs in a separate process and writes to disk on
>> every Nth database modification.
>>
>>> Both options would let you run SSL as well as handle basic or digest
>>> auth.
>>
>> True.
>>
>>> As far as tying in Thrift, I haven't done this myself and
>>> unfortunately can't offer much. Hopefully there are others here who
>>> can. As you've already suggested, taking a look at the RPC layer and
>>> seeing how you can tie it into the backend from wsgi is a start.
>>
>> Yeah, that's what I gather.  I'll play with it over the weekend.
>>
>>> IMO, the lack of a security story for Thrift is a weakness. I'm not
>>> sure what discussions there have been to address this. I started to
>>> implement SSL support for Java and Python, but found I had to modify
>>> a fair amount of Thrift code and ended up punting by using stunnel  
>>> to
>>> setup a secure connection between client and server. You might find
>>> this the path of least resistance as well, in particular if you can
>>> add
>>> the authentication layer to your Thrift IDL.
>>
>> Yeah, built-in SSL support would be nice.
>>
>> My client will be running on an iPhone -- no stunnel.  Oh, yeah, I
>> should mention that it seems most people use Thrift for talking from
>> say their web server to *internal* web services but I'm planning on
>> using it as a public-facing web service, like the EverNote folks are.
>> It was actually good to see another instance of someone planning on
>> using Thrift this way.
>>
>>> As one other approach, you can use a symmetric key to sign a request
>>> and send the signature in the clear with the rest of your thrift  
>>> data.
>>> As long as you keep the signing key secret, this would let you
>>> validate
>>> the origin and integrity of the request. If there's anything  
>>> sensitive
>>> in the request itself, though, this is no good.
>>
>> Right.  I cannot really trust the client -- iPhone apps are getting
>> cracked left and right.  Once cracked, someone will poke around  
>> enough
>> in the binary to find out my secret symmetric key even if not stored
>> as a literal string.
>>
>> Thus, I want to use SSL for anything sensitive.
>>
>> I'll create the equivalent of an auth token (same idea as login
>> cookies) with opaque data encrypted using a symmetric key only
>> available on the service-side.  The client will send back the auth
>> token with each Thrift RPC.  There's a lot more to this to fight
>> replay attacks, client spoofing, etc. but that isn't relevant here.
>>
>> I need to be able to register a user account from the client (I know,
>> spammers will try to automate that but I have countermeasures) and
>> login the user as well.  This requires sending the sensitive user
>> information which, while essentially obfuscated to eavesdroppers by
>> virtue of using a binary protocol, can be reverse engineered easily
>> enough I bet.
>>
>>> Alas, message signing is another application layer measure -- it  
>>> would
>>> be sweet to see auth work its way into the Thrift spec.
>>
>> Yeah, I'm planning on requiring signatures ala Amazon Web Services.
>> Some data used in the request signature calculation will only be
>> available to the client and the service and never transmitted between
>> them in the clear -- it would be transmitted to the client during a
>> login over HTTPS.
>>
>> Auth in Thrift would be wonderful but I wonder if that's feature  
>> creep?
>>
>>> Good luck!
>>>
>>> Garrett
>>
>> Thanks!
>> Brian
>>


Mime
View raw message