flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cory Monty <cory.mo...@getbraintree.com>
Subject Re: Running on a firewalled Yarn cluster?
Date Tue, 10 Nov 2015 20:27:24 GMT
Thanks, Stephan.

I'll give those two workarounds a try!

On Tue, Nov 10, 2015 at 2:18 PM, Stephan Ewen <sewen@apache.org> wrote:

> Hi Cory!
>
> There is no flag to define the BlobServer port right now, but we should
> definitely add this: https://issues.apache.org/jira/browse/FLINK-2996
>
> If your setup is such that the firewall problem is only between client and
> master node (and the workers can reach the master on all ports), then you
> can try two workarounds:
>
> 1) Start the program in the cluster (or on the master node, via ssh).
>
> 2) Add the program jar to the lib directory of Flink, and start your
> program with the RemoteExecutor, without a jar attachment. Then it only
> needs to communicate to the actor system (RPC) port, which is not random in
> standalone mode (6123 by default).
>
> Stephan
>
>
>
>
> On Tue, Nov 10, 2015 at 8:46 PM, Cory Monty <cory.monty@getbraintree.com>
> wrote:
>
>> I'm also running into an issue with a non-YARN cluster. When submitting a
>> JAR to Flink, we'll need to have an arbitrary port open on all of the
>> hosts, which we don't know about until the socket attempts to bind; a bit
>> of a problem for us.
>>
>> Are there ways to submit a JAR to Flink that bypasses the need for the
>> BlobServer's random port binding? Or, to control the port BlobServer binds
>> to?
>>
>> Cheers,
>>
>> Cory
>>
>> On Thu, Nov 5, 2015 at 8:07 AM, Niels Basjes <Niels@basjes.nl> wrote:
>>
>>> That is what I tried. Couldn't find that port though.
>>>
>>> On Thu, Nov 5, 2015 at 3:06 PM, Robert Metzger <rmetzger@apache.org>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> cool, that's good news.
>>>>
>>>> The RM proxy is only for the web interface of the AM.
>>>>
>>>>  I'm pretty sure that the MapReduce AM has at least two ports:
>>>> - one for the web interface (accessible through the RM proxy, so behind
>>>> the firewall)
>>>> - one for the AM RPC (and that port is allocated within the configured
>>>> range, open through the firewall).
>>>>
>>>> You can probably find the RPC port in the log file of the running
>>>> MapReduce AM (to find that, identify the NodeManager running the AM, access
>>>> the NM web interface and retrieve the logs of the container running the AM).
>>>>
>>>> Maybe the mapreduce client also logs the AM RPC port when querying the
>>>> status of a running job.
>>>>
>>>>
>>>> On Thu, Nov 5, 2015 at 2:59 PM, Niels Basjes <Niels@basjes.nl> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I checked and this setting has been set to a limited port range of
>>>>> only 100 port numbers.
>>>>>
>>>>> I tried to find the actual port an AM is running on and couldn't find
>>>>> it (I'm not the admin on that cluster)
>>>>>
>>>>> The url to the AM that I use to access it always looks like this:
>>>>>
>>>>> http://master-001.xxxxxx.net:8088/proxy/application_1443166961758_85492/index.html
>>>>>
>>>>> As you can see I never connect directly; always via the proxy that
>>>>> runs over the master on a single fixed port.
>>>>>
>>>>> Niels
>>>>>
>>>>> On Thu, Nov 5, 2015 at 2:46 PM, Robert Metzger <rmetzger@apache.org>
>>>>> wrote:
>>>>>
>>>>>> While discussing with my colleagues about the issue today, we came
up
>>>>>> with another approach to resolve the issue:
>>>>>>
>>>>>> d) Upload the job jar to HDFS (or another FS) and trigger the
>>>>>> execution of the jar using an HTTP request to the web interface.
>>>>>>
>>>>>> We could add some tooling into the /bin/flink client to submit a
job
>>>>>> like this transparently, so users would not need to bother with the
file
>>>>>> upload and request sending.
>>>>>> Also, Sachin started a discussion on the dev@ list to add support
>>>>>> for submitting jobs over the web interface, so maybe we can base
the fix
>>>>>> for FLINK-2960 on that.
>>>>>>
>>>>>> I've also looked into the Hadoop MapReduce code and it seems they
do
>>>>>> the following:
>>>>>> When submitting a job, they are uploading the job jar file to HDFS.
>>>>>> They also upload a configuration file that contains all the config
options
>>>>>> of the job. Then, they submit this altogether as an application to
YARN.
>>>>>> So far, there has not been any firewall involved. They establish
a
>>>>>> connection between the JobClient and the ApplicationMaster when the
user is
>>>>>> querying the current job status, but I could not find any special
code
>>>>>> getting the status over HTTP.
>>>>>>
>>>>>> But I found the following configuration parameter:
>>>>>> "yarn.app.mapreduce.am.job.client.port-range", so it seems that they
try to
>>>>>> allocate the AM port within that range (if specified).
>>>>>> Niels, can you check if this configuration parameter is set in your
>>>>>> environment? I assume your firewall allows outside connections from
that
>>>>>> port range.
>>>>>> So we also have a new approach:
>>>>>>
>>>>>> f) Allocate the YARN application master (and blob manager) within
a
>>>>>> user-specified port-range.
>>>>>>
>>>>>> This would be really easy to implement, because we would just need
to
>>>>>> go through the range until we find an available port.
>>>>>>
>>>>>>
>>>>>> On Tue, Nov 3, 2015 at 1:06 PM, Niels Basjes <Niels@basjes.nl>
wrote:
>>>>>>
>>>>>>> Great!
>>>>>>>
>>>>>>> I'll watch the issue and give it a test once I see a working
patch.
>>>>>>>
>>>>>>> Niels Basjes
>>>>>>>
>>>>>>> On Tue, Nov 3, 2015 at 1:03 PM, Maximilian Michels <mxm@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Niels,
>>>>>>>>
>>>>>>>> Thanks a lot for reporting this issue. I think it is a very
common
>>>>>>>> setup in corporate infrastructure to have restrictive firewall
settings.
>>>>>>>> For Flink 1.0 (and probably in a minor 0.10.X release) we
will have to
>>>>>>>> address this issue to ensure proper integration of Flink.
>>>>>>>>
>>>>>>>> I've created a JIRA to keep track:
>>>>>>>> https://issues.apache.org/jira/browse/FLINK-2960
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Max
>>>>>>>>
>>>>>>>> On Tue, Nov 3, 2015 at 11:02 AM, Niels Basjes <Niels@basjes.nl>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I forgot to answer your other question:
>>>>>>>>>
>>>>>>>>> On Mon, Nov 2, 2015 at 4:34 PM, Robert Metzger <
>>>>>>>>> rmetzger@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> so the problem is that you can not submit a job to
Flink using
>>>>>>>>>> the "/bin/flink" tool, right?
>>>>>>>>>> I assume Flink and its TaskManagers properly start
and connect to
>>>>>>>>>> each other (the number of TaskManagers is shown correctly
in the web
>>>>>>>>>> interface).
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Correct. Flink starts (i see the jobmanager UI) but the
actual job
>>>>>>>>> is not started.
>>>>>>>>>
>>>>>>>>> Niels Basjes
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>>
>>>>>>> Niels Basjes
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards / Met vriendelijke groeten,
>>>>>
>>>>> Niels Basjes
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>>
>

Mime
View raw message