beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-383) BigQueryIO: update sink to shard into multiple write jobs
Date Sun, 04 Sep 2016 22:12:21 GMT

    [ https://issues.apache.org/jira/browse/BEAM-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15463597#comment-15463597
] 

ASF GitHub Bot commented on BEAM-383:
-------------------------------------

GitHub user dhalperi opened a pull request:

    https://github.com/apache/incubator-beam/pull/917

    [BEAM-383] BigQuery: limit max job polling time to 1 minute

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
    
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt).
    
    ---
    
    Before the backoff would grow unboundedly, so we could in principle wait
    1.5x to 2x the actual job time. For long running jobs this is hours.
    Now, we just back off at most 1 minute between checking the job state.
    Note there should be no danger of QPS overload here because we should
    have very few concurrent outstanding jobs

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dhalperi/incubator-beam bigquery-write-backoff

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-beam/pull/917.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #917
    
----
commit 0cc46575c32fc5f96b9ec0a488b639d8ea105b99
Author: Dan Halperin <dhalperi@google.com>
Date:   2016-09-04T21:54:42Z

    BigQuery: limit max job polling time to 1 minute
    
    Before the backoff would grow unboundedly, so we could in principle wait
    1.5x to 2x the actual job time. For long running jobs this is hours.
    Now, we just back off at most 1 minute between checking the job state.
    Note there should be no danger of QPS overload here because we should
    have very few concurrent outstanding jobs

----


> BigQueryIO: update sink to shard into multiple write jobs
> ---------------------------------------------------------
>
>                 Key: BEAM-383
>                 URL: https://issues.apache.org/jira/browse/BEAM-383
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-gcp
>            Reporter: Daniel Halperin
>            Assignee: Ian Zhou
>             Fix For: 0.3.0-incubating
>
>
> BigQuery has global limits on both the # files that can be written in a single job and
the total bytes in those files. We should be able to modify BigQueryIO.Write to chunk into
multiple smaller jobs that meet these limits, write to temp tables, and atomically copy into
the destination table.
> This functionality will let us safely stay within BQ's load job limits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message