hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HADOOP-11680) Deduplicate jars in convenience binary distribution
Date Tue, 10 Mar 2015 05:44:39 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-11680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Allen Wittenauer resolved HADOOP-11680.
    Resolution: Duplicate

I'm going to close this as a dupe of HADOOP-10115, especially since that was just committed.

> Deduplicate jars in convenience binary distribution
> ---------------------------------------------------
>                 Key: HADOOP-11680
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11680
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
> Pulled from discussion on HADOOP-11656 Colin wrote:
> {quote}
> bq. Andrew wrote: One additional note related to this, we can spend a lot of time right
now distributing 100s of MBs of jar dependencies when launching a YARN job. Maybe this is
ameliorated by the new shared distributed cache, but I've heard this come up quite a bit as
a complaint. If we could meaningfully slim down our client, it could lead to a nice win.
> I'm frustrated that nobody responded to my earlier suggestion that we de-duplicate jars.
This would drastically reduce the size of our install, and without rearchitecting anything.
> In fact I was so frustrated that I decided to write a program to do it myself and measure
the delta. Here it is:
> Before:
> {code}
> du -h /h
> 249M    /h
> {code}
> After:
> {code}
> du -h /h
> 140M    /h
> {code}
> Seems like deduplicating jars would be a much better project than splitting into a client
jar, if we really cared about this.
> <snip>
> {quote}

This message was sent by Atlassian JIRA

View raw message