hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Kumar <sureshapache...@gmail.com>
Subject Re: Shuffling over the network for local map data.
Date Tue, 22 Jan 2013 22:22:40 GMT
Hi Luke,

I checked the /etc/hosts and it is configured correctly. Looks like the
slow shuffle read speeds we were getting are due to slow disk IO.

I will go through the change MAPREDUCE-4049 and see if I can update my
patch to work with that code on version 3.0.0

I did not think of EC2, that is a good idea.


On Tue, Jan 22, 2013 at 11:24 AM, Luke Lu <llu@vicaya.com> wrote:

> You can setup the right /etc/hosts to support the loopback. OTOH, saving
> disk io would be more important for small clusters with large instances.
> Hadoop historically works on large clusters with relatively small
> instances, so the issue was not as acute. MAPREDUCE-4049 allows the shuffle
> to be pluggable, so you won't have to patch Hadoop framework code itself.
> Are you saying that you don't have access to EC2?
> On Tue, Jan 22, 2013 at 11:02 AM, Suresh Kumar <sureshapachedev@gmail.com
> >wrote:
> > I have a patch that tries to use file links instead of making a copy of
> > the data that is already available locally. I tested it on the a single
> > machine cluster configuration running 48 mappers and reducers. I
> > unfortunately do not have access to a cluster even a small one. Can some
> on
> > review and test run my patch ?
> >
> > I created the patch using Eclipse against 1.0.3. My knowledge in Java in
> > limited and the code is not well written in some classes. So please let
> me
> > know if I need to make changes to the code along with a short explanation
> > of the change.  I will happily do so.
> >
> > Thanks,
> > Suresh.
> >
> >
> >
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message