flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kostas Kloudas (Jira)" <j...@apache.org>
Subject [jira] [Closed] (FLINK-14843) Streaming bucketing end-to-end test can fail with Output hash mismatch
Date Thu, 16 Jan 2020 09:42:00 GMT

     [ https://issues.apache.org/jira/browse/FLINK-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kostas Kloudas closed FLINK-14843.
----------------------------------
    Resolution: Fixed

Merged on master with 9a58dedebef57cddd6e325ae40edc74aff6c6248
and on release-1.10 with 6ddcacd0abf9eac1fc5b6e2bc9dda94e453ec969

> Streaming bucketing end-to-end test can fail with Output hash mismatch
> ----------------------------------------------------------------------
>
>                 Key: FLINK-14843
>                 URL: https://issues.apache.org/jira/browse/FLINK-14843
>             Project: Flink
>          Issue Type: Bug
>          Components: Connectors / FileSystem, Tests
>    Affects Versions: 1.10.0
>         Environment: rev: dcc1330375826b779e4902176bb2473704dabb11
>            Reporter: Gary Yao
>            Assignee: PengFei Li
>            Priority: Major
>              Labels: pull-request-available, test-stability
>             Fix For: 1.10.0
>
>         Attachments: complete_result, flink-gary-standalonesession-0-gyao-desktop.log,
flink-gary-taskexecutor-0-gyao-desktop.log, flink-gary-taskexecutor-1-gyao-desktop.log, flink-gary-taskexecutor-2-gyao-desktop.log,
flink-gary-taskexecutor-3-gyao-desktop.log, flink-gary-taskexecutor-4-gyao-desktop.log, flink-gary-taskexecutor-5-gyao-desktop.log,
flink-gary-taskexecutor-6-gyao-desktop.log
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> *Description*
> Streaming bucketing end-to-end test ({{test_streaming_bucketing.sh}}) can fail with Output
hash mismatch.
> {noformat}
> Number of running task managers has reached 4.
> Job (e0b7a86e4d4111f3947baa3d004e083a) is running.
> Waiting until all values have been produced
> Truncating buckets
> Number of produced values 26930/60000
> Truncating buckets
> Number of produced values 30890/60000
> Truncating buckets
> Number of produced values 37340/60000
> Truncating buckets
> Number of produced values 41290/60000
> Truncating buckets
> Number of produced values 46710/60000
> Truncating buckets
> Number of produced values 52120/60000
> Truncating buckets
> Number of produced values 57110/60000
> Truncating buckets
> Number of produced values 62530/60000
> Cancelling job e0b7a86e4d4111f3947baa3d004e083a.
> Cancelled job e0b7a86e4d4111f3947baa3d004e083a.
> Waiting for job (e0b7a86e4d4111f3947baa3d004e083a) to reach terminal state CANCELED ...
> Job (e0b7a86e4d4111f3947baa3d004e083a) reached terminal state CANCELED
> Job e0b7a86e4d4111f3947baa3d004e083a was cancelled, time to verify
> FAIL Bucketing Sink: Output hash mismatch.  Got 9e00429abfb30eea4f459eb812b470ad, expected
01aba5ff77a0ef5e5cf6a727c248bdc3.
> head hexdump of actual:
> 0000000   (   2   ,   1   0   ,   0   ,   S   o   m   e       p   a   y
> 0000010   l   o   a   d   .   .   .   )  \n   (   2   ,   1   0   ,   1
> 0000020   ,   S   o   m   e       p   a   y   l   o   a   d   .   .   .
> 0000030   )  \n   (   2   ,   1   0   ,   2   ,   S   o   m   e       p
> 0000040   a   y   l   o   a   d   .   .   .   )  \n   (   2   ,   1   0
> 0000050   ,   3   ,   S   o   m   e       p   a   y   l   o   a   d   .
> 0000060   .   .   )  \n   (   2   ,   1   0   ,   4   ,   S   o   m   e
> 0000070       p   a   y   l   o   a   d   .   .   .   )  \n   (   2   ,
> 0000080   1   0   ,   5   ,   S   o   m   e       p   a   y   l   o   a
> 0000090   d   .   .   .   )  \n   (   2   ,   1   0   ,   6   ,   S   o
> 00000a0   m   e       p   a   y   l   o   a   d   .   .   .   )  \n   (
> 00000b0   2   ,   1   0   ,   7   ,   S   o   m   e       p   a   y   l
> 00000c0   o   a   d   .   .   .   )  \n   (   2   ,   1   0   ,   8   ,
> 00000d0   S   o   m   e       p   a   y   l   o   a   d   .   .   .   )
> 00000e0  \n   (   2   ,   1   0   ,   9   ,   S   o   m   e       p   a
> 00000f0   y   l   o   a   d   .   .   .   )  \n                        
> 00000fa
> Stopping taskexecutor daemon (pid: 55164) on host gyao-desktop.
> Stopping standalonesession daemon (pid: 51073) on host gyao-desktop.
> Stopping taskexecutor daemon (pid: 51504) on host gyao-desktop.
> Skipping taskexecutor daemon (pid: 52034), because it is not running anymore on gyao-desktop.
> Skipping taskexecutor daemon (pid: 52472), because it is not running anymore on gyao-desktop.
> Skipping taskexecutor daemon (pid: 52916), because it is not running anymore on gyao-desktop.
> Stopping taskexecutor daemon (pid: 54121) on host gyao-desktop.
> Stopping taskexecutor daemon (pid: 54726) on host gyao-desktop.
> [FAIL] Test script contains errors.
> Checking of logs skipped.
> [FAIL] 'flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh' failed after
2 minutes and 3 seconds! Test exited with exit code 1
> {noformat}
> *How to reproduce*
> Comment out the delay of 10s after the 1st TM is restarted to provoke the issue:
> {code:bash}
> echo "Restarting 1 TM"
> $FLINK_DIR/bin/taskmanager.sh start
> wait_for_number_of_running_tms 4
> #sleep 10
> echo "Killing 2 TMs"
> kill_random_taskmanager
> kill_random_taskmanager
> wait_for_number_of_running_tms 2
> {code}
> Command to run the test:
> {noformat}
> FLINK_DIR=build-target/ flink-end-to-end-tests/run-single-test.sh skip flink-end-to-end-tests/test-scripts/test_streaming_bucketing.sh
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message