hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13171) Add StorageStatistics to S3A; instrument some more operations
Date Fri, 27 May 2016 00:12:13 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303230#comment-15303230
] 

Chris Nauroth commented on HADOOP-13171:
----------------------------------------

The new stats are very thorough.  The refactoring of {{ProgressListener}} and pulling encryption
logic out of the output streams is nice too.  Thank you, Steve!

{{TestS3ADirectoryPerformance#testListOperations}} is timing out for me consistently in my
runs against US-west-2.  It appears to be making progress, but not quickly enough.

{code}
    int scale = (int)getOperationCount();
    int width = 2 * scale;
    int depth = 2 * scale;
    int files = 2 * scale;
{code}

Are you sure this is what you wanted for these factors?  Please check my math here, but I
believe the interaction between width and depth means the total number of directories grows
exponentially with respect to scale, something like (2*scale) ^ (2*scale) + 1 (for the root).
 Then, considering the file count per directory, we have a total of ((2*scale) ^ (2*scale)
+ 1) * (2 * scale) files.  The default operation count is 2005, so these are big numbers.
 Even if I down-tune to scale.test.operation.count=4, it still takes a long time.

Please add an {{@Override}} annotation to {{ProgressableProgressListener#progressChanged}}.

I think {{ProgressableProgressListener#getLastBytesTransferred}} is no longer called anywhere
and can be removed.

Maybe I'm missing something, but it appears that after execution of the base {{FileSystem#initialize}},
it's impossible for {{FileSystem#statistics}} to be null.  Therefore, the null guards in the
increment methods should be unnecessary.  If you prefer to keep them in the spirit of defensive
coding, I'm fine with that too.

{code}
    private Iterator<Map.Entry<Statistic, AtomicLong>> iterator =
        opsCount.entrySet().iterator();
{code}

I suggest changing to the following...

{code}
    private Iterator<Map.Entry<Statistic, AtomicLong>> iterator =
        Collections.unmodifiableSet(opsCount.entrySet()).iterator();
{code}

...so that we block callers from removing entries through the iterator.  That's probably applicable
to the equivalent code up in hadoop-common too, but probably best to keep the scope focused
just on the hadoop-aws code here.

The JavaDocs for {{S3ATestUtils#createSubdirs}} aren't at the right indentation level.

Repeating a comment from HADOOP-13131, {{getS3AFS}} might instead be an override of {{getFileSystem}}
with a covariant return type.

{code}
  @Test
  public void testCostOfCopyFromLocalFile() throws Throwable {
    describe("testCostOfCopyFromLocalFile");
    File tmpFile = File.createTempFile("tests3acost", ".txt");
{code}

I'd prefer to avoid accessing /tmp due to the potential for platform-specific oddities, e.g.
HADOOP-761 and MAPREDUCE-5191.  Can we switch to something based on {{getContract().getTestPath()}}?


> Add StorageStatistics to S3A; instrument some more operations
> -------------------------------------------------------------
>
>                 Key: HADOOP-13171
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13171
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-13171-branch-2-001.patch, HADOOP-13171-branch-2-002.patch,
HADOOP-13171-branch-2-003.patch, HADOOP-13171-branch-2-004.patch, HADOOP-13171-branch-2-005.patch,
HADOOP-13171-branch-2-006.patch, HADOOP-13171-branch-2-007.patch
>
>
> Add {{StorageStatistics}} support to S3A, collecting the same metrics as the instrumentation,
but sharing across all instances.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message