drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keys Botzum <kbot...@mapr.com>
Subject Re: Drill Session ID between Nodes
Date Fri, 23 Jun 2017 14:52:56 GMT
There is something here I'm not understanding. In the below the hostname is always the same
so there should be no problem as long as all drillbits share a common signer.

I'm also just not following how certificate authentication issues are even linked to the Drill
session issues. Whether or not there is a Drill session, the SSL handshake rules still apply.
Or there is something here I just don't understand - quite possibly of course. I'm just focused
on the SSL issue as this I understand very well.

Incidentally, regarding hostname verification, I'm not familiar with what controls you have
but many libraries (including Java) give you the ability to write your own SSL verifier which
is called only when the default hostname verification fails. In that code you can implement
different rules. Perhaps you can find a rule that meets your needs (such as a common signer
for all Drillbits). Remember that certificate hostname validation is just a convention. There
is nothing about SSL that makes this necessary. Here's the Java version: https://docs.oracle.com/javase/7/docs/api/javax/net/ssl/HostnameVerifier.html.
In case you are curious, this is how MapR's maprlogin works with HTTPS even though we use
 IP address by default.

Keys
_______________________________
Keys Botzum
Distinguished Engineer, Field Engineering
kbotzum@maprtech.com<mailto:kbotzum@maprtech.com>
443-718-0098
MapR Technologies
http://www.mapr.com



On Jun 23, 2017, at 10:22 AM, John Omernik <john@omernik.com<mailto:john@omernik.com>>
wrote:

The wild card certificate isn't a problem on it's own, it's using it in a
manner that allows me to maintain all of the various features I want.  Let
me lay this out,

In marathon I have a task, it runs a drill bit.  Since that task is located
at the node prod/drillprod (for my env it's role/instanceid) the domain
name is setup to be

drillprod-prod.marathon.mesos.

I can run X number of instances of that task. I tell Marathon to make them
"host unique" so no two drill bits end up on the same node.  This gives me
a few things

1. If choose there to be 3 drillbits running, they go and run, and I don't
have to worry about them.  If I have to reboot the node one of them is on,
Marathon says "oh look I am only running 2, let's spin up another, and then
I get my required 3 bits running automatically.

2. They use a common config directory located in MapR-FS this is really
nice because I don't have to maintain separate configurations for each
drill bit.

3. The name above, drillprod-prod.marathon.mesos, using nslookup returns

Name: drillprod-prod.marathon.mesos

Address: 192.168.0.105

Name: drillprod-prod.marathon.mesos

Address: 192.168.0.103

Name: drillprod-prod.marathon.mesos

Address: 192.168.0.104


Which is desired. When I have a client connect, I can program in a single
name (drillprod-prod.marathon.mesos) into my script and never have to worry
about where the bits run on the cluster.  It looks it up and works great.
This has been my standard MO for scripts that do short lived things... I
haven't had an issue until this new use case came up. (long running
sessions for use in analytic notebooks is the use case BTW, just not super
relevant to go into details on that here)


Because of the DNS naming, my scripts get tossed around to different bits
depending on how the DNS round robin provides the IP which is desired for
various scripts.  The issue comes into play when I make a session
connection, and for some reason, (maybe after a cache time out or
something) python's requests object makes the next request, but does a DNS
lookup first causing the IP to change, and the session to invalidate. Not
awesome when working in a notebook.

The wildcard DNS "could" work, but there are some gotchas... I could create
an application folder in marathon with the same name, prod/drillprod, and
then in there I could create a task with the hostname for each host.

However, this would then make me loose on the HAness of my setup. If I am
trying to run 3 instances of bits, on nodes node104, node103, and node105,
and I need to reboot node105, in my setup, node102 could get the new bit
the dns name auto updates and I maintain HA with simplicity, however, with
a wildcard cert, I would still need to manually spin up a new instance to
maintain three instances.

In addition, I would have to get a list of the three nodes running to pick
one to connect to.  Lots of complex orchestration to use wild card certs to
maintain HA.

The reverse proxy will work for me, I can program nginx to pin
connections.  Thus, I can have it base which backend it goes to based on
the JSESSIONID, that should work, but I don't like it because it requires
another component running in my network, not bad for me, I can easily run
that on Zeta that won't be an issue at all, but as a whole, it's not ideal
for Drill users.

Thus I am back to the idea of Drill somehow maintaining a global state.
This is also important for Drill on Yarn setups (unless there is some sort
of application container proxy back to the bits). If you want to have
security (SSL) with hostnames, the session maintenance must be addressed.

So that's why I toss it out here... this is a desirable feature I would
imagine, even if people are not asking for it now, it may not because they
don't need it, but in their testing of Drill, and how they using it now, it
may not come up... when they have multiple people and services hitting
drill end points pointing them individual nodes for SSL management etc,
becomes a nightmare... thus, as a thought exercise, could be securely
maintain valid session ideas in Zookeeper for nodes to check on? What would
an ideal setup for something like that be?




On Fri, Jun 23, 2017 at 7:07 AM, Keys Botzum <kbotzum@mapr.com<mailto:kbotzum@mapr.com>>
wrote:

Why is a wildcard certificate a problem? They are quite common. One just
needs all of the Drillbits to share a common domain for the wildcard to be
easy and thus avoid having to list individual hosts.

Are you saying that you can't use hostnames and must use IPs?

In case I'm not clear, here's an example of what I'm saying.

this is good with wildcards: drill1.mydrill.corp.com<http://drill1.mydrill.corp.com><http:/
/drill1.mydrill.corp.com<http://drill1.mydrill.corp.com>>, drill2.mydrill.corp.com<http://drill2.mydrill.corp.com><http:/
/drill2.mydrill.corp.com<http://drill2.mydrill.corp.com>>, drill3.mydrill.corp.com<http://drill3.mydrill.corp.com><http:/
/drill3.mydrill.corp.com<http://drill3.mydrill.corp.com>>, drill4.mydrill.corp.com<http://drill4.mydrill.corp.com><http:/
/drill4.mydrill.corp.com<http://drill4.mydrill.corp.com>>,
this is bad with wildcards: drill1, drill2, drill3, drill4


Keys
_______________________________
Keys Botzum
MapR Technologies



On Jun 22, 2017, at 8:24 PM, John Omernik <john@omernik.com<mailto:john@omernik.com><mailto:john@
omernik.com<http://omernik.com>>> wrote:

Would there be interest in finding a way to globalize this? This is
challenging for me and others that may run drill with multi Tennant
orchestrators.  In my particular setup, each node running drill gets added
to an a record automatically giving me HA and distribution of Rest API
queries.  It also allows me to have a single certificate for my cluster
rather than managing certificates on a individual basis.   I set things up
to connect via IP but then I had certificate mismatch warnings. My goal is
to find a way to connect to the rest API , while maintaining a session to
single node, with out sacrificing HA and balancing and with compromising
ssl security.   I know it's a tall order, but if there I ideas outside of a
global state management I am all ears.

Note some ideas I've also considered:

1.  using a load balancer that would allow me to pin connections.  Not
ideal because it's another service to manage but it would work.

2. There may be a way to hack things with a wild card cert but it's seems
complicated and fragile.

On Jun 22, 2017 5:47 PM, "Sorabh Hamirwasia" <shamirwasia@mapr.com<mailto:shamirwasia@mapr.com><mailto:
shamirwasia@mapr.com<mailto:shamirwasia@mapr.com>>> wrote:

Hi John,
As Paul mentioned session ID's are not global. Each session is part of the
BitToUserConnection instance created for a connection between Drillbit and
client. Hence it's local to that Drillbit only and the lifetime of the
session is tied to lifetime of the connection. You can find the code here<
https://github.com/apache/drill/blob/master/exec/
java-exec/src/main/java/org/apache/drill/exec/rpc/user/
UserServer.java#L102>.

Thanks,
Sorabh

________________________________
From: Paul Rogers <progers@mapr.com>
Sent: Thursday, June 22, 2017 2:19:50 PM
To: user@drill.apache.org
Subject: Re: Drill Session ID between Nodes

Hi John,

I do not believe that session IDs are global. Each Drillbit maintains its
own concept of sessions. A global session would require some centralized
registry of sessions, which Drill does not have.

Would be great if someone can confirm…

- Paul

On Jun 22, 2017, at 12:14 PM, John Omernik <john@omernik.com> wrote:

When I log onto a drill node, and get Session Id, if I connect to another
drill node in the cluster will the session id be valid?

I am guessing not, but want to validate.

My conumdrum, I have my Drill cluster running in such a way that the
connections to the nodes are load balanced via DNS. However, if I get a
DNS
IP while in session it appears to invalidate, and thus forces me to log
on...





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message