flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: Running continuously on yarn with kerberos
Date Sat, 07 Nov 2015 20:35:41 GMT
How long should this take if you have HDD and about 8GB of RAM?
Is that 10 minutes? 20?

Niels

On Sat, Nov 7, 2015 at 2:51 PM, Stephan Ewen <sewen@apache.org> wrote:

> Hi Niels!
>
> Usually, you simply build the binaries by invoking "mvn -DskipTests clean
> package" in the root flink directory. The resulting program should be in
> the "build-target" directory.
>
> If the program gets stuck, let us know where and what the last message on
> the command line is.
>
> Please be aware that the final step of building the "flink-dist" project
> may take a while, especially on systems with hard disks (as opposed to
> SSDs) and a comparatively low amount of memory. The reason is that the
> building of the final JAR file is quite expensive, because the system
> re-packages certain libraries in order to avoid conflicts between different
> versions.
>
> Stephan
>
>
> On Sat, Nov 7, 2015 at 2:40 PM, Niels Basjes <niels@basj.es> wrote:
>
>> Hi,
>>
>> Excellent.
>> What you can help me with are the commands to build the binary
>> distribution from source.
>> I tried it last Thursday and the build seemed to get stuck at some point
>> (at the end of/just after building the dist module).
>> I haven't been able to figure out why yet.
>>
>> Niels
>> On 5 Nov 2015 14:57, "Maximilian Michels" <mxm@apache.org> wrote:
>>
>>> Thank you for looking into the problem, Niels. Let us know if you need
>>> anything. We would be happy to merge a pull request once you have verified
>>> the fix.
>>>
>>> On Thu, Nov 5, 2015 at 1:38 PM, Niels Basjes <Niels@basjes.nl> wrote:
>>>
>>>> I created https://issues.apache.org/jira/browse/FLINK-2977
>>>>
>>>> On Thu, Nov 5, 2015 at 12:25 PM, Robert Metzger <rmetzger@apache.org>
>>>> wrote:
>>>>
>>>>> Hi Niels,
>>>>> thank you for analyzing the issue so properly. I agree with you. It
>>>>> seems that HDFS and HBase are using their own tokes which we need to
>>>>> transfer from the client to the YARN containers. We should be able to
port
>>>>> the fix from Spark (which they got from Storm) into our YARN client.
>>>>> I think we would add this in org.apache.flink.yarn.Utils#
>>>>> setTokensFor().
>>>>>
>>>>> Do you want to implement and verify the fix yourself? If you are to
>>>>> busy at the moment, we can also discuss how we share the work (I'm
>>>>> implementing it, you test the fix)
>>>>>
>>>>>
>>>>> Robert
>>>>>
>>>>> On Tue, Nov 3, 2015 at 5:26 PM, Niels Basjes <Niels@basjes.nl>
wrote:
>>>>>
>>>>>> Update on the status so far.... I suspect I found a problem in a
>>>>>> secure setup.
>>>>>>
>>>>>> I have created a very simple Flink topology consisting of a streaming
>>>>>> Source (the outputs the timestamp a few times per second) and a Sink
(that
>>>>>> puts that timestamp into a single record in HBase).
>>>>>> Running this on a non-secure Yarn cluster works fine.
>>>>>>
>>>>>> To run it on a secured Yarn cluster my main routine now looks like
>>>>>> this:
>>>>>>
>>>>>> public static void main(String[] args) throws Exception {
>>>>>>     System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
>>>>>>     UserGroupInformation.loginUserFromKeytab("nbasjes@xxxxxx.NET",
"/home/nbasjes/.krb/nbasjes.keytab");
>>>>>>
>>>>>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
>>>>>>     env.setParallelism(1);
>>>>>>
>>>>>>     DataStream<String> stream = env.addSource(new TimerTicksSource());
>>>>>>     stream.addSink(new SetHBaseRowSink());
>>>>>>     env.execute("Long running Flink application");
>>>>>> }
>>>>>>
>>>>>> When I run this
>>>>>>      flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 4096
>>>>>> ./kerberos-1.0-SNAPSHOT.jar
>>>>>>
>>>>>> I see after the startup messages:
>>>>>>
>>>>>> 17:13:24,466 INFO  org.apache.hadoop.security.UserGroupInformation
>>>>>>             - Login successful for user nbasjes@xxxxxx.NET using
>>>>>> keytab file /home/nbasjes/.krb/nbasjes.keytab
>>>>>> 11/03/2015 17:13:25 Job execution switched to status RUNNING.
>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched
to
>>>>>> SCHEDULED
>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched
to
>>>>>> DEPLOYING
>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1) switched
to
>>>>>> RUNNING
>>>>>>
>>>>>> Which looks good.
>>>>>>
>>>>>> However ... no data goes into HBase.
>>>>>> After some digging I found this error in the task managers log:
>>>>>>
>>>>>> 17:13:42,677 WARN  org.apache.hadoop.hbase.ipc.RpcClient        
                - Exception encountered while connecting to the server : javax.security.sasl.SaslException:
GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level:
Failed to find any Kerberos tgt)]
>>>>>> 17:13:42,677 FATAL org.apache.hadoop.hbase.ipc.RpcClient        
                - SASL authentication failed. The most likely cause is missing or invalid
credentials. Consider 'kinit'.
>>>>>> javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos
tgt)]
>>>>>> 	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>>>>>> 	at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177)
>>>>>> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815)
>>>>>> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349)
>>>>>>
>>>>>>
>>>>>> First starting a yarn-session and then loading my job gives the same
>>>>>> error.
>>>>>>
>>>>>> My best guess at this point is that Flink needs the same fix as
>>>>>> described here:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/SPARK-6918   (
>>>>>> https://github.com/apache/spark/pull/5586 )
>>>>>>
>>>>>> What do you guys think?
>>>>>>
>>>>>> Niels Basjes
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Oct 27, 2015 at 6:12 PM, Maximilian Michels <mxm@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Niels,
>>>>>>>
>>>>>>> You're welcome. Some more information on how this would be
>>>>>>> configured:
>>>>>>>
>>>>>>> In the kdc.conf, there are two variables:
>>>>>>>
>>>>>>>         max_life = 2h 0m 0s
>>>>>>>         max_renewable_life = 7d 0h 0m 0s
>>>>>>>
>>>>>>> max_life is the maximum life of the current ticket. However,
it may
>>>>>>> be renewed up to a time span of max_renewable_life from the first
ticket
>>>>>>> issue on. This means that from the first ticket issue, new tickets
may be
>>>>>>> requested for one week. Each renewed ticket has a life time of
max_life (2
>>>>>>> hours in this case).
>>>>>>>
>>>>>>> Please let us know about any difficulties with long-running
>>>>>>> streaming application and Kerberos.
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Max
>>>>>>>
>>>>>>> On Tue, Oct 27, 2015 at 2:46 PM, Niels Basjes <Niels@basjes.nl>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Thanks for your feedback.
>>>>>>>> So I guess I'll have to talk to the security guys about having
>>>>>>>> special
>>>>>>>> kerberos ticket expiry times for these types of jobs.
>>>>>>>>
>>>>>>>> Niels Basjes
>>>>>>>>
>>>>>>>> On Fri, Oct 23, 2015 at 11:45 AM, Maximilian Michels <
>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>
>>>>>>>>> Hi Niels,
>>>>>>>>>
>>>>>>>>> Thank you for your question. Flink relies entirely on
the Kerberos
>>>>>>>>> support of Hadoop. So your question could also be rephrased
to
>>>>>>>>> "Does
>>>>>>>>> Hadoop support long-term authentication using Kerberos?".
And the
>>>>>>>>> answer is: Yes!
>>>>>>>>>
>>>>>>>>> While Hadoop uses Kerberos tickets to authenticate users
with
>>>>>>>>> services
>>>>>>>>> initially, the authentication process continues differently
>>>>>>>>> afterwards. Instead of saving the ticket to authenticate
on a later
>>>>>>>>> access, Hadoop creates its own security tockens (DelegationToken)
>>>>>>>>> that
>>>>>>>>> it passes around. These are authenticated to Kerberos
>>>>>>>>> periodically. To
>>>>>>>>> my knowledge, the tokens have a life span identical to
the Kerberos
>>>>>>>>> ticket maximum life span. So be sure to set the maximum
life span
>>>>>>>>> very
>>>>>>>>> high for long streaming jobs. The renewal time, on the
other hand,
>>>>>>>>> is
>>>>>>>>> not important because Hadoop abstracts this away using
its own
>>>>>>>>> security tockens.
>>>>>>>>>
>>>>>>>>> I'm afraid there is not Kerberos how-to yet. If you are
on Yarn,
>>>>>>>>> then
>>>>>>>>> it is sufficient to authenticate the client with Kerberos.
On a
>>>>>>>>> Flink
>>>>>>>>> standalone cluster you need to ensure that, initially,
all nodes
>>>>>>>>> are
>>>>>>>>> authenticated with Kerberos using the kinit tool.
>>>>>>>>>
>>>>>>>>> Feel free to ask if you have more questions and let us
know about
>>>>>>>>> any
>>>>>>>>> difficulties.
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Max
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Oct 22, 2015 at 2:06 PM, Niels Basjes <Niels@basjes.nl>
>>>>>>>>> wrote:
>>>>>>>>> > Hi,
>>>>>>>>> >
>>>>>>>>> > I want to write a long running (i.e. never stop
it) streaming
>>>>>>>>> flink
>>>>>>>>> > application on a kerberos secured Hadoop/Yarn cluster.
My
>>>>>>>>> application needs
>>>>>>>>> > to do things with files on HDFS and HBase tables
on that cluster
>>>>>>>>> so having
>>>>>>>>> > the correct kerberos tickets is very important.
The stream is to
>>>>>>>>> be ingested
>>>>>>>>> > from Kafka.
>>>>>>>>> >
>>>>>>>>> > One of the things with Kerberos is that the tickets
expire after
>>>>>>>>> a
>>>>>>>>> > predetermined time. My knowledge about kerberos
is very limited
>>>>>>>>> so I hope
>>>>>>>>> > you guys can help me.
>>>>>>>>> >
>>>>>>>>> > My question is actually quite simple: Is there an
howto
>>>>>>>>> somewhere on how to
>>>>>>>>> > correctly run a long running flink application with
kerberos
>>>>>>>>> that includes a
>>>>>>>>> > solution for the kerberos ticket timeout  ?
>>>>>>>>> >
>>>>>>>>> > Thanks
>>>>>>>>> >
>>>>>>>>> > Niels Basjes
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>>>
>>>>>>>> Niels Basjes
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>
>>>>>> Niels Basjes
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards / Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>>
>>>
>>>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Mime
View raw message