flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Felix seibert (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-12550) hostnames with a dot never receive local input splits
Date Sun, 19 May 2019 11:23:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-12550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16843397#comment-16843397

Felix seibert commented on FLINK-12550:

After openining PR #8478 yesterday, I have some additional considerations.

The status quo is the following:
 * To check if an input split is locally available for a taskmanager, the hostname of the
taskmanager is compared to the hostname of the input split. This happens in [this line|[https://github.com/apache/flink/blob/4fa387164cea44f8e0bac1aadab11433c0f0ff2b/flink-core/src/main/java/org/apache/flink/api/common/io/LocatableInputSplitAssigner.java#L223]:]

if (h != null && NetUtils.getHostnameFromFQDN(h.toLowerCase()).equals(flinkHost)){code}
h is the hostname of a machine hosting the input split, flinkHost is the taskmanager that
is looking for an input split. NetUtils.getHostnameFromFQDN() truncates at the first occurrance
of a ".". So, if a split is present on "host.domain", and the hostname of the taskmanager
is "host.domain" too, we actually check whether "host".equals("host.domain") which is not
true. PR #8478 applies getHostnameFromFQDN() on the taskmanager hostname as well, so it seems
that this problem is fixed.


BUT. What if there is a taskmanager on host "host.cluster1.domain", and an input split on
host "host.cluster2.domain"? isLocal() would recognize this split as being on the same host
as the taskmanager, which is clearly not the case.

So to me it looks like getHostNameFromFQDN() shouldn't be applied on neither of the two compared

Or is there any reason why it should be applied?


> hostnames with a dot never receive local input splits
> -----------------------------------------------------
>                 Key: FLINK-12550
>                 URL: https://issues.apache.org/jira/browse/FLINK-12550
>             Project: Flink
>          Issue Type: Bug
>          Components: API / DataSet
>    Affects Versions: 1.8.0
>            Reporter: Felix seibert
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
> LocatableInputSplitAssigner (in package api.common.io) fails to assign local input splits
to hosts whose hostname contains a dot ("."). To reproduce add the following test to LocatableSplitAssignerTest
and execute it. It will always fail. In my mind, this is contrary to the expected behaviour,
which is that the host should obtain the one split that is stored on the very same machine.
> {code:java}
> @Test
> public void testLocalSplitAssignmentForHostWithDomainName() {
>    try {
>       String hostNameWithDot = "testhost.testdomain";
>       // load one split
>       Set<LocatableInputSplit> splits = new HashSet<LocatableInputSplit>();
>       splits.add(new LocatableInputSplit(0, hostNameWithDot));
>       // get next split for the host
>       LocatableInputSplitAssigner ia = new LocatableInputSplitAssigner(splits);
>       InputSplit is = null;
>       ia.getNextInputSplit(hostNameWithDot, 0);
>       // there should be exactly zero remote and one local assignment
>       assertEquals(0, ia.getNumberOfRemoteAssignments());
>       assertEquals(1, ia.getNumberOfLocalAssignments());
>    }
>    catch (Exception e) {
>       e.printStackTrace();
>       fail(e.getMessage());
>    }
> }
> {code}
> I also experienced this error in practice, and will later today open a pull request to
fix it.
> Note: I'm not sure if I selected the correct component category.

This message was sent by Atlassian JIRA

View raw message