kudu-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ale...@apache.org
Subject [kudu] 03/04: [java] Attempt to deflake SecureKuduSinkTest
Date Thu, 30 May 2019 05:05:47 GMT
This is an automated email from the ASF dual-hosted git repository.

alexey pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/kudu.git

commit 3cbc025869494be2a1fa91582dfabc95d7bc88f9
Author: Will Berkeley <wdberkeley@gmail.com>
AuthorDate: Tue May 28 14:09:30 2019 -0700

    [java] Attempt to deflake SecureKuduSinkTest
    
    Occasionally, SecureKuduSinkTest fails when running with TSAN binaries
    because of the following sequence of operations:
    
    1. The Kerberos ticket lifetime is set to 10s.
    2. The test sets up a mini kudu cluster. This first sets up the KDC,
       which creates credentials for all of the Kudu servers and kinits
       using test user credentials for the test process.
    3. The setup of the cluster takes > 10s.
    4. At the end of the cluster setup, the test checks that setup succeeded
       in part by issuing a ListTabletServers RPC. This fails because the test
       user ticket has expired.
    5. The test fails because it can't set up the cluster.
    
    The failure looks like
    
    21:50:06.500 [ERROR - main] (RetryRule.java:217) org.apache.kudu.flume.sink.SecureKuduSinkTest.testEventsWithShortTickets:
failed attempt 1
    java.io.IOException: ListTabletServers RPC failed: Client connection negotiation failed:
client connection to 127.12.111.60:36425: server requires authentication, but client does
not have Kerberos credentials available
    	at org.apache.kudu.test.cluster.MiniKuduCluster.sendRequestToCluster(MiniKuduCluster.java:169)
~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
    	at org.apache.kudu.test.cluster.MiniKuduCluster.start(MiniKuduCluster.java:234) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
    	at org.apache.kudu.test.cluster.MiniKuduCluster.access$300(MiniKuduCluster.java:71) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
    	at org.apache.kudu.test.cluster.MiniKuduCluster$MiniKuduClusterBuilder.build(MiniKuduCluster.java:658)
~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
    	at org.apache.kudu.test.KuduTestHarness.before(KuduTestHarness.java:140) ~[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
    	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46) ~[junit-4.12.jar:4.12]
    	at org.apache.kudu.test.junit.RetryRule$RetryStatement.doOneAttempt(RetryRule.java:215)
[kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
    	at org.apache.kudu.test.junit.RetryRule$RetryStatement.evaluate(RetryRule.java:232) [kudu-test-utils-1.10.0-SNAPSHOT.jar:1.10.0-SNAPSHOT]
            ...
    
    This patch attempts to deflake the test a bit by doubling the ticket
    lifetime to 20s. It also raises the renewal lifetime to 35s from 30s,
    to provide a bit of extra time between the ticket expiring and when
    Flume needs to renew the ticket. Before, the test waited 2x the
    renewable ticket lifetime. I made it so the test waits until the
    renewable ticket lifetime plus one second has passed, including the
    time spent in the test so far.
    
    I tried to test this on dist-test using TSAN binaries. With the new
    patch I saw 0/1000 failures, but without it I saw 830/1000 failures.
    That's *way* flakier than any previous indication, so I don't trust
    those results. The failures I sampled did seem to be related to the same
    issue, but it was ConnectToCluster RPCs failing instead.
    
    Change-Id: Icc936878d7f1496905e83ddaf93b9b049f417f72
    Reviewed-on: http://gerrit.cloudera.org:8080/13454
    Reviewed-by: Adar Dembo <adar@cloudera.com>
    Tested-by: Kudu Jenkins
    Reviewed-by: Alexey Serbin <aserbin@cloudera.com>
---
 .../java/org/apache/kudu/flume/sink/SecureKuduSinkTest.java  | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/SecureKuduSinkTest.java
b/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/SecureKuduSinkTest.java
index 187cf68..11aa07b 100644
--- a/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/SecureKuduSinkTest.java
+++ b/java/kudu-flume-sink/src/test/java/org/apache/kudu/flume/sink/SecureKuduSinkTest.java
@@ -25,6 +25,8 @@ import static org.apache.kudu.util.SecurityUtil.KUDU_TICKETCACHE_PROPERTY;
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertTrue;
 
+import java.time.Duration;
+import java.time.Instant;
 import java.util.ArrayList;
 import java.util.List;
 import java.util.concurrent.TimeUnit;
@@ -49,8 +51,8 @@ import org.apache.kudu.test.cluster.MiniKuduCluster.MiniKuduClusterBuilder;
 
 public class SecureKuduSinkTest {
   private static final Logger LOG = LoggerFactory.getLogger(SecureKuduSinkTest.class);
-  private static final int TICKET_LIFETIME_SECONDS = 10;
-  private static final int RENEWABLE_LIFETIME_SECONDS = 30;
+  private static final int TICKET_LIFETIME_SECONDS = 20;
+  private static final int RENEWABLE_LIFETIME_SECONDS = 35;
 
   private static final MiniKuduClusterBuilder clusterBuilder = KuduTestHarness.getBaseClusterBuilder()
       .kdcTicketLifetime(TICKET_LIFETIME_SECONDS + "s")
@@ -68,6 +70,7 @@ public class SecureKuduSinkTest {
 
   @Test
   public void testEventsWithShortTickets() throws Exception {
+    Instant start = Instant.now();
     LOG.info("Creating new table...");
     ArrayList<ColumnSchema> columns = new ArrayList<>(1);
     columns.add(new ColumnSchema.ColumnSchemaBuilder("payload", Type.BINARY).key(true).build());
@@ -88,7 +91,10 @@ public class SecureKuduSinkTest {
     processEvents(sink, 0, eventCount / 2);
 
     LOG.info("Waiting for tickets to expire");
-    TimeUnit.SECONDS.sleep(RENEWABLE_LIFETIME_SECONDS * 2);
+    Duration elapsedSoFar = Duration.between(Instant.now(), start);
+    TimeUnit.MILLISECONDS.sleep(1000 * (RENEWABLE_LIFETIME_SECONDS + 1) - elapsedSoFar.toMillis());
+    // At this point, the ticket will have been outstanding for at least
+    // (RENEWABLE_LIFETIME_SECONDS + 1) seconds-- so the sink will need to reacquire a ticket.
 
     LOG.info("Testing events after ticket renewal.");
     processEvents(sink, eventCount / 2, eventCount);


Mime
View raw message