beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chamikara Jayalath (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-5426) Use both destination and TableDestination for BQ load job IDs
Date Tue, 18 Sep 2018 22:39:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619823#comment-16619823
] 

Chamikara Jayalath commented on BEAM-5426:
------------------------------------------

In that case, how about keeping track of load jobs for different destinations, and failing
the job if we detect two load jobs for the same destination ? We should find a way to actively
fail for this case, since currently this ends up being a silent data loss.

> Use both destination and TableDestination for BQ load job IDs
> -------------------------------------------------------------
>
>                 Key: BEAM-5426
>                 URL: https://issues.apache.org/jira/browse/BEAM-5426
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-gcp
>            Reporter: Chamikara Jayalath
>            Priority: Major
>
> Currently we use TableDestination when creating a unique load job ID for a destination:
[https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java#L359]
>  
> This can result in a data loss issue if a user returns the same TableDestination for
different destination IDs. I think we can prevent this if we include both IDs in the BQ load
job ID.
>  
> CC: [~reuvenlax]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message