hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15547) WASB: listStatus performance
Date Tue, 19 Jun 2018 01:27:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516518#comment-16516518
] 

Steve Loughran commented on HADOOP-15547:
-----------------------------------------

core sounds good; not in a position to test this right now (travelling).

If you are tuning list, make sure that listStatus() is covered too, to see if you can go from
a directory walk to a flatter scan & incremental results...HDFS uses it to avoid marshalling
massive arrays; for S3A we can switch from a slow treewalk to a files/5000 LIST calls, irrespective
of depth. And no, no need to guarantee consistency of the list. 

I don't we have any list scale tests for use across stores. There is {{ITestS3ADirectoryPerformance}}
to use as a source...we can't just pull that up as its counting internal FS metrics and asserting
on their values.

> WASB: listStatus performance
> ----------------------------
>
>                 Key: HADOOP-15547
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15547
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 2.9.1, 3.0.2
>            Reporter: Thomas Marquardt
>            Assignee: Thomas Marquardt
>            Priority: Major
>         Attachments: HADOOP-15547.001.patch, HADOOP-15547.002.patch, HADOOP-15547.003.patch
>
>
> The WASB implementation of Filesystem.listStatus is very slow due to O(n!) algorithm
to remove duplicates and uses too much memory due to the extra conversion from BlobListItem
to FileMetadata to FileStatus.  It takes over 30 minutes to list 700,000 files.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message