beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johan Brodin (JIRA)" <>
Subject [jira] [Updated] (BEAM-1386) Job hangs without warnings after reading ~20GB of gz csv
Date Fri, 03 Feb 2017 15:29:51 GMT


Johan Brodin updated BEAM-1386:
    Priority: Critical  (was: Major)

> Job hangs without warnings after reading ~20GB of gz csv
> --------------------------------------------------------
>                 Key: BEAM-1386
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>    Affects Versions: 0.5.0
>         Environment: Running on Google Dataflow with 'n1-standard-8' machines.
>            Reporter: Johan Brodin
>            Assignee: Ahmet Altay
>            Priority: Critical
> When running the job it works fine up until 20GB or around 23 million rows from a gzip:ed
csv file (total size 43M rows). Halted the job so the statistic from it seam to disappeared,
but here it is the id "2017-02-03_04_25_41-15296331815975218867". Is there any built in limitations
to file size? Should I try to break the file up into several smaller files? Could the issue
be related to the workers disk size?

This message was sent by Atlassian JIRA

View raw message