beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reuven Lax (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-5426) Use both destination and TableDestination for BQ load job IDs
Date Tue, 18 Sep 2018 22:45:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619830#comment-16619830
] 

Reuven Lax commented on BEAM-5426:
----------------------------------

Two issues:
 # I'm not sure how to do this easily as the destinations are sharded across all the workers.
 # We don't have a way of failing jobs from in the SDK. The best we can do is throw an exception,
but that doesn't necessarily fail the job (for Dataflow streaming, that will simply result
in a infinite exception loop and a stuck job).

> Use both destination and TableDestination for BQ load job IDs
> -------------------------------------------------------------
>
>                 Key: BEAM-5426
>                 URL: https://issues.apache.org/jira/browse/BEAM-5426
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-gcp
>            Reporter: Chamikara Jayalath
>            Priority: Major
>
> Currently we use TableDestination when creating a unique load job ID for a destination:
[https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java#L359]
>  
> This can result in a data loss issue if a user returns the same TableDestination for
different destination IDs. I think we can prevent this if we include both IDs in the BQ load
job ID.
>  
> CC: [~reuvenlax]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message