beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Work logged] (BEAM-4824) Get BigQueryIO batch loads to return something actionable
Date Tue, 18 Sep 2018 19:44:03 GMT


ASF GitHub Bot logged work on BEAM-4824:

                Author: ASF GitHub Bot
            Created on: 18/Sep/18 19:43
            Start Date: 18/Sep/18 19:43
    Worklog Time Spent: 10m 
      Work Description: reuvenlax commented on issue #6055: [BEAM-4824] Batch BigQueryIO returns
job results
   BTW my second comment still stands I think. BigQueryIO currently uses load jobs as an implementation
detail. It might end up creating one load job per table, or it might end up creating multiple
load jobs per table (if the table is very large). Collapsing the multiple jobs together might
be very confusing. I think making information about these jobs part of the public API is very
confusing, when the actual logical model is per record.
   Another thing: there will be upcoming changes to the BigQuery API, and we plan on getting
rid of load jobs entirely from BigQueryIO. If we make information about load jobs part of
the public API, it might be problematic when we remove the load jobs.
   Is this something that could be accomplished with better logging, or are there concrete
use cases for wanting the output in a PCollection?

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

Issue Time Tracking

    Worklog Id:     (was: 145452)
    Time Spent: 1h 10m  (was: 1h)

> Get BigQueryIO batch loads to return something actionable
> ---------------------------------------------------------
>                 Key: BEAM-4824
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-gcp
>            Reporter: Carlos Alonso
>            Assignee: Carlos Alonso
>            Priority: Minor
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
> ATM BigQueryIO batchloads returns an empty collection that has no information related
to how the load job finished. It is even returned before the job finishes.
> Change it so that:
>  # The returning PCollection only appers when the job has actually finished
>  # The returning PCollection contains information about the job result

This message was sent by Atlassian JIRA

View raw message