hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14946) S3Guard testPruneCommandCLI can fail
Date Thu, 24 May 2018 05:15:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488430#comment-16488430
] 

Aaron Fabbri commented on HADOOP-14946:
---------------------------------------

Looking at this again.. since I was able to reproduce an earlier case even with the patch.

My last comment above matches earlier stack trace, but not Steve's most recent one (edited
for clarity):
{quote}testPruneCommand:201->AbstractS3GuardToolTestBase.assertMetastoreListingCount:214->Assert.assertEquals:555->Assert.assertEquals:118->Assert.failNotEquals:743->Assert.fail:88
*Pruned children count* [
   /test/testPruneCommandCLI/fresh; isDirectory=false; modification_time=152649906*9374*;,
   /test/testPruneCommandCLI/*stale*; isDirectory=false; modification_time=152649906*6615*;]

 *expected:<1> but was:<2>*
{quote}
This is a "not enough items pruned" error, (The "pruned children count" wording is confusing.)
and I don't have an explanation.

In this case, either (A) {{sleep( x )}} slept < x seconds, or (B) there is a timekeeping
error somewhere. However, the testPruneCommandCLI/*stale* file should have been pruned: 
Note the time delta between the stale and fresh file is 2759 msec (~2.8 sec). This implies
the existing sleep(2 sec) did sleep long enough (we know stale is at least 2.8 sec old so
prune should have caught it). This points towards either an issue with the CLI interpreting
"-seconds 1", or a different clock source or something?

I *was* able to reproduce the earlier case (too many things pruned because i purposely fork
bombed my system) and added some code that skips the assertion when the test is taking too
long to get to the prune command, e.g.:
{quote}AbstractS3GuardToolTestBase.java:testPruneCommand(250)) - Skipping an assertion: Test
running too slowly (2539 msec)
{quote}

> S3Guard testPruneCommandCLI can fail
> ------------------------------------
>
>                 Key: HADOOP-14946
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14946
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Gabor Bota
>            Priority: Major
>         Attachments: HADOOP-14946.001.patch
>
>
> The test of the S3Guard CLI prune can sometimes fail on parallel test runs. Assumption:
it is the parallelism which is causing the problem
> {code}
> org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB
> testPruneCommandCLI(org.apache.hadoop.fs.s3a.s3guard.ITestS3GuardToolDynamoDB)  Time
elapsed: 10.765 sec  <<< FAILURE!
> java.lang.AssertionError: Pruned children count [] expected:<1> but was:<0>
> 	at org.junit.Assert.fail(Assert.java:88)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message