nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Baclace <...@baclace.net>
Subject Re: why task tracker ports random?
Date Tue, 27 Sep 2005 02:37:33 GMT
Stefan Groschupf wrote:
> Hi Paul,
> my call stack say that actually no other classes using the tasktracker.
> Beside that tasktracker could be implement NutchConfigurable than all  
> problems would be solved since this is IOC pattern.
> Or do I oversee something?

I am thinking about the mapred branch and the case of a mapred
multiprocess run over one or more machines.  In this case,
multiple tasktracker processes are created.

 > why are the taskReportPort and mapOutputPort randomly generated?
 > I can not see any reasons for that and wondering why we not just
 > have  that configurable as well.

There is a reason to bind to a random port in some cases.  I once has
a process fire off every 5 minutes to make an SSH connection so that
Unison could run over that.  When I picked a static port, it should,
in theory, be available again within 5 minutes, but once
every few days the port would be stuck and new SSH connections would
fail.  I did not determine why it got stuck, but the wait for the
OS closing an unclosed socket is 3 minutes, just in case the close
ACK packet is bouncing around through all possible hops.

During JUnit testing, if processes are not closed gracefully, that
3 minutes could seem like a long time and could limit the amount of
testing that is done.

If tasktracker ports are picked randomly without retrying when a port is
already busy, then that is problem.  If the ports are picked randomly
until open ports are found, then that is okay.  An even better solution
is to have the sequence of ports tested be different for each tasktracker
process so that N tacktrackers on one machine don't all simultaneously
race to listen on port P and then P+1, etc. for N-1 consecutive races.

Paul

Mime
View raw message