drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Re: Drill Session ID between Nodes
Date Fri, 23 Jun 2017 14:22:45 GMT
The wild card certificate isn't a problem on it's own, it's using it in a
manner that allows me to maintain all of the various features I want.  Let
me lay this out,

In marathon I have a task, it runs a drill bit.  Since that task is located
at the node prod/drillprod (for my env it's role/instanceid) the domain
name is setup to be


I can run X number of instances of that task. I tell Marathon to make them
"host unique" so no two drill bits end up on the same node.  This gives me
a few things

1. If choose there to be 3 drillbits running, they go and run, and I don't
have to worry about them.  If I have to reboot the node one of them is on,
Marathon says "oh look I am only running 2, let's spin up another, and then
I get my required 3 bits running automatically.

2. They use a common config directory located in MapR-FS this is really
nice because I don't have to maintain separate configurations for each
drill bit.

3. The name above, drillprod-prod.marathon.mesos, using nslookup returns

Name: drillprod-prod.marathon.mesos


Name: drillprod-prod.marathon.mesos


Name: drillprod-prod.marathon.mesos


Which is desired. When I have a client connect, I can program in a single
name (drillprod-prod.marathon.mesos) into my script and never have to worry
about where the bits run on the cluster.  It looks it up and works great.
This has been my standard MO for scripts that do short lived things... I
haven't had an issue until this new use case came up. (long running
sessions for use in analytic notebooks is the use case BTW, just not super
relevant to go into details on that here)

Because of the DNS naming, my scripts get tossed around to different bits
depending on how the DNS round robin provides the IP which is desired for
various scripts.  The issue comes into play when I make a session
connection, and for some reason, (maybe after a cache time out or
something) python's requests object makes the next request, but does a DNS
lookup first causing the IP to change, and the session to invalidate. Not
awesome when working in a notebook.

The wildcard DNS "could" work, but there are some gotchas... I could create
an application folder in marathon with the same name, prod/drillprod, and
then in there I could create a task with the hostname for each host.

However, this would then make me loose on the HAness of my setup. If I am
trying to run 3 instances of bits, on nodes node104, node103, and node105,
and I need to reboot node105, in my setup, node102 could get the new bit
the dns name auto updates and I maintain HA with simplicity, however, with
a wildcard cert, I would still need to manually spin up a new instance to
maintain three instances.

In addition, I would have to get a list of the three nodes running to pick
one to connect to.  Lots of complex orchestration to use wild card certs to
maintain HA.

The reverse proxy will work for me, I can program nginx to pin
connections.  Thus, I can have it base which backend it goes to based on
the JSESSIONID, that should work, but I don't like it because it requires
another component running in my network, not bad for me, I can easily run
that on Zeta that won't be an issue at all, but as a whole, it's not ideal
for Drill users.

Thus I am back to the idea of Drill somehow maintaining a global state.
This is also important for Drill on Yarn setups (unless there is some sort
of application container proxy back to the bits). If you want to have
security (SSL) with hostnames, the session maintenance must be addressed.

So that's why I toss it out here... this is a desirable feature I would
imagine, even if people are not asking for it now, it may not because they
don't need it, but in their testing of Drill, and how they using it now, it
may not come up... when they have multiple people and services hitting
drill end points pointing them individual nodes for SSL management etc,
becomes a nightmare... thus, as a thought exercise, could be securely
maintain valid session ideas in Zookeeper for nodes to check on? What would
an ideal setup for something like that be?

On Fri, Jun 23, 2017 at 7:07 AM, Keys Botzum <kbotzum@mapr.com> wrote:

> Why is a wildcard certificate a problem? They are quite common. One just
> needs all of the Drillbits to share a common domain for the wildcard to be
> easy and thus avoid having to list individual hosts.
> Are you saying that you can't use hostnames and must use IPs?
> In case I'm not clear, here's an example of what I'm saying.
> this is good with wildcards: drill1.mydrill.corp.com<http:/
> /drill1.mydrill.corp.com>, drill2.mydrill.corp.com<http:/
> /drill2.mydrill.corp.com>, drill3.mydrill.corp.com<http:/
> /drill3.mydrill.corp.com>, drill4.mydrill.corp.com<http:/
> /drill4.mydrill.corp.com>,
> this is bad with wildcards: drill1, drill2, drill3, drill4
> Keys
> _______________________________
> Keys Botzum
> MapR Technologies
> On Jun 22, 2017, at 8:24 PM, John Omernik <john@omernik.com<mailto:john@
> omernik.com>> wrote:
> Would there be interest in finding a way to globalize this? This is
> challenging for me and others that may run drill with multi Tennant
> orchestrators.  In my particular setup, each node running drill gets added
> to an a record automatically giving me HA and distribution of Rest API
> queries.  It also allows me to have a single certificate for my cluster
> rather than managing certificates on a individual basis.   I set things up
> to connect via IP but then I had certificate mismatch warnings. My goal is
> to find a way to connect to the rest API , while maintaining a session to
> single node, with out sacrificing HA and balancing and with compromising
> ssl security.   I know it's a tall order, but if there I ideas outside of a
> global state management I am all ears.
> Note some ideas I've also considered:
> 1.  using a load balancer that would allow me to pin connections.  Not
> ideal because it's another service to manage but it would work.
> 2. There may be a way to hack things with a wild card cert but it's seems
> complicated and fragile.
> On Jun 22, 2017 5:47 PM, "Sorabh Hamirwasia" <shamirwasia@mapr.com<mailto:
> shamirwasia@mapr.com>> wrote:
> Hi John,
> As Paul mentioned session ID's are not global. Each session is part of the
> BitToUserConnection instance created for a connection between Drillbit and
> client. Hence it's local to that Drillbit only and the lifetime of the
> session is tied to lifetime of the connection. You can find the code here<
> https://github.com/apache/drill/blob/master/exec/
> java-exec/src/main/java/org/apache/drill/exec/rpc/user/
> UserServer.java#L102>.
> Thanks,
> Sorabh
> ________________________________
> From: Paul Rogers <progers@mapr.com>
> Sent: Thursday, June 22, 2017 2:19:50 PM
> To: user@drill.apache.org
> Subject: Re: Drill Session ID between Nodes
> Hi John,
> I do not believe that session IDs are global. Each Drillbit maintains its
> own concept of sessions. A global session would require some centralized
> registry of sessions, which Drill does not have.
> Would be great if someone can confirm…
> - Paul
> On Jun 22, 2017, at 12:14 PM, John Omernik <john@omernik.com> wrote:
> When I log onto a drill node, and get Session Id, if I connect to another
> drill node in the cluster will the session id be valid?
> I am guessing not, but want to validate.
> My conumdrum, I have my Drill cluster running in such a way that the
> connections to the nodes are load balanced via DNS. However, if I get a
> IP while in session it appears to invalidate, and thus forces me to log
> on...

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message