hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Marquardt (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HADOOP-15547) WASB: listStatus performance
Date Mon, 18 Jun 2018 04:08:00 GMT
Thomas Marquardt created HADOOP-15547:

             Summary: WASB: listStatus performance
                 Key: HADOOP-15547
                 URL: https://issues.apache.org/jira/browse/HADOOP-15547
             Project: Hadoop Common
          Issue Type: Bug
          Components: fs/azure
    Affects Versions: 3.0.2, 2.9.1
            Reporter: Thomas Marquardt
            Assignee: Thomas Marquardt

The WASB implementation of Filesystem.listStatus is very slow due to O(n!) algorithm to remove
duplicates and uses too much memory due to the extra conversion from BlobListItem to FileMetadata
to FileStatus.  It takes over 30 minutes to list 700,000 files.  

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message