beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-3133) Staging files on Dataflow should not block forever without producing updates
Date Tue, 28 Nov 2017 22:20:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-3133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269581#comment-16269581
] 

ASF GitHub Bot commented on BEAM-3133:
--------------------------------------

tgroh closed pull request #3641: [BEAM-3133] Add logging when Staging takes a long time
URL: https://github.com/apache/beam/pull/3641
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
index 565e965d4eb..2d2c63a6f64 100644
--- a/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
+++ b/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/util/PackageUtil.java
@@ -52,6 +52,8 @@
 import java.util.concurrent.Callable;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
 import java.util.concurrent.atomic.AtomicInteger;
 import javax.annotation.Nullable;
 import org.apache.beam.sdk.annotations.Internal;
@@ -389,7 +391,19 @@ public DataflowPackage apply(StagingResult stagingResult) {
     }
 
     try {
-      List<DataflowPackage> stagedPackages = Futures.allAsList(destinationPackages).get();
+      ListenableFuture<List<DataflowPackage>> stagingFutures =
+          Futures.allAsList(destinationPackages);
+      boolean finished = false;
+      do {
+        try {
+          stagingFutures.get(3L, TimeUnit.MINUTES);
+          finished = true;
+        } catch (TimeoutException e) {
+          // finished will still be false
+          LOG.info("Still staging {} files", classpathElements.size());
+        }
+      } while (!finished);
+      List<DataflowPackage> stagedPackages = stagingFutures.get();
       LOG.info(
           "Staging files complete: {} files cached, {} files newly uploaded",
           numCached.get(), numUploaded.get());


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Staging files on Dataflow should not block forever without producing updates
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-3133
>                 URL: https://issues.apache.org/jira/browse/BEAM-3133
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-dataflow
>            Reporter: Thomas Groh
>            Assignee: Thomas Groh
>            Priority: Trivial
>
> If staging does not make progress for whatever reason, or finish quickly, it should produce
updates to make it obviously identifiable.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message