hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mahadev konar (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-6467) Performance improvement for liststatus on directories in hadoop archives.
Date Sat, 13 Feb 2010 06:01:29 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-6467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mahadev konar updated HADOOP-6467:
----------------------------------

    Attachment: HADOOP-6467.patch

this patch fixes the issue of slow liststatus. I ran it on my machine and it was able to run
liststaus of a 10K directory file in 3-4 seconds. This patch does not implement the proposal
attached to this jira but does a simple brute force of reading the whole index file to find
all the children of a directory. I tried the approach that I had mentioned in the proposal
but found that it just complicates the code a little bit (to maintain backwards compatibility),
so I tried doing the brute force way, which turns out to be fast enough for daily usage of
har filesystem by map reduce jobs.

nicholas can you try this out and post the numbers with the patch? 

thanks

> Performance improvement for liststatus on directories in hadoop archives.
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-6467
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6467
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Mahadev konar
>            Assignee: Mahadev konar
>             Fix For: 0.22.0
>
>         Attachments: Archives_performance.docx, Archives_performance.docx, HADOOP-6467.patch
>
>
> A liststatus call on a directory in hadoop archives leads to ( 2* number of files in
directory) open calls to the namenode. This is very sub optimal and needs to be fixed to make
it performant enough to be used on a daily basis. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message