Hey Matteo,

Looks like you did quite a bit of digging in the code! Responses inline below.

On Wed, Oct 11, 2017 at 1:24 PM, Matteo Durighetto <m.durighetto@miriade.it> wrote:
Hello,
           I have a strange behaviour with Kudu 1.4 and kerberos.
I enabled kerberos on kudu, I have the principal correctly in the OU of an AD, but
at startup I got a lot of errors on method TSHeartbeat between tablet server and 
master server as unauthorized. There's no firewall between nodes.

right, "unauthorized" indicates that the connection was made fine, but the individual RPC call was determined to not be allowed for the identity presented on the other side of the connection.
 

W1011 <time>   server_base.cc:316] Unauthorized access attempt 
to method kudu.master.MasterService.TSHeartbeat 
from {username='abcdefgh1234', principal='kudu/HOSTNAME@DOMAIN.XYZ'} 
at <IP>:37360

the "abcdefgh1234" it's an example of the the string created by the cloudera manager during the enable kerberos.

This output indicates that it successfully authenticated via Kerberos as the principal listed above. That's good news and means you don't need to worry about rdns, etc (if you had issues with that it would have had trouble finding a service ticket or authenticating the connection). This means you got past the "authentication" step and having problems at the "authorization" step.
 

The other services (hdfs and so on ) are under kerberos without problem and there is the rdns at true in the /etc/krb5.conf (  KUDU-2032 ).
As I understand the problem is something about the 3° level of authorization between master servers and tablet servers.

Right.
 
... <snipped>
So I think the problem, as I say before, could be in  ContainsKey(users_, username);  :

bool SimpleAcl::UserAllowed(const string& username) {
  return ContainsKey(users_, "*") || ContainsKey(users_, username);
}

  
At this point It's not clear for me how Kudu build the array/key list users for daemon service ( it's not as super users or user ACL an external parameter).

Exactly. The users here for the 'service' ACL are set in ServerBase::InitAcls():

  boost::optional<string> keytab_user = security::GetLoggedInUsernameFromKeytab();
  if (keytab_user) {
    // If we're logged in from a keytab, then everyone should be, and we expect them
    // to use the same mapped username.
    service_user = *keytab_user;
  } else {
    // If we aren't logged in from a keytab, then just assume that the services
    // will be running as the same Unix user as we are.
    RETURN_NOT_OK_PREPEND(GetLoggedInUser(&service_user),
                          "could not deterine local username");
  }

Since you're using Kerberos, the top branch here would apply -- it's calling GetLoggedInUsernameFromKeytab() from init.cc.

You can see what username the server is getting by looking for a log message at startup like "Logged in from keytab as kudu/<host>@REALM (short username <XYZ>)". Here 'XYZ' is the username that ends up in the service ACL.

So, basically, it's critical that the username that the master determines for itself (from this function) matches the username that it has determined for the tablet servers when they authenticate (what you pasted as 'abcdefgh1234' above).

That brings us to the next question: how do we convert from a principal like kudu/<HOST>@REALM to a short "username"? The answer there is the function 'MapPrincipalToLocalName' again from security/init.cc. This function delegates the mapping to the krb5 library itself using the krb5_aname_to_localname() API. The results of this API can vary depending on the kerberos configuration, but in typical configurations it's determined by the 'auth_to_local' configuration in your krb5.conf. See the corresponding section in the docs here:

https://web.mit.edu/kerberos/krb5-1.12/doc/admin/conf_files/krb5_conf.html

My guess is that your host has been configured such that when the master maps its own principal, it's getting a different result than when it maps the principal being used by the tservers.

Hope that gets you on the right track.

Thanks
-Todd
--
Todd Lipcon
Software Engineer, Cloudera