hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paweł Brach <bras...@gmail.com>
Subject Re: connecton loss exception
Date Wed, 16 Feb 2011 06:24:06 GMT
Yes, I have of course. My cluster has been configured and both examples
PiEstimator and SerializePrinting work (there is communication between 3
nodes). I've modified your example  - PiEstimator (put everything in the
loop) and it works for few iterations (there is communication) and after
that connection is lost. After that connection is re-established but some
messages are missing. It looks like that Hama framework is very unstable
when it's loaded and many messages are sending between nodes.
On the same cluster I've configured Apache Hadoop and it's very stable.
If you have own cluster configured, could you run my example on it ? Have
you ever run something more complicated than PiEstimator and
SerializePrinting on it ?

Cheers,
Pawel

2011/2/16 Chia-Hung Lin <clin4j@googlemail.com>

> Have you configured zookeeper in hama-site.xml? Hama makes use of
> zookeeper to do node communication IIRC.
>
>    Opening socket connection to server cl5/127.0.1.1:2181
>
> indicates that seems only localhost is up.  If this is the case, you
> can change hama.zookeeper.quorum property pointing with value set to
> e.g.
>
> <property>
>    <name>hama.zookeeper.quorum</name>
>    <value>node1,node2,node3,node4,node5</value>
> </property>
>
> Hope it helps
>
> 2011/2/15 Paweł Brach <braszek@gmail.com>:
> > Hello,
> >
> > During last few days I've tested Hama solutions and today I found some
> > strange error in Hama framework. If you run a simple job with more than
> few
> > supersteps the following error occures:
> >
> > 2011-02-15 15:13:55,934 ERROR org.apache.hama.bsp.BSPPeer:
> > 2011-02-15 15:13:56,525 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket
> > connection to server cl5/127.0.1.1:2181
> > 2011-02-15 15:13:56,526 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
> > for server null, unexpected error, closing socket connection and
> attempting
> > reconnect
> > java.net.ConnectException: Connection refused
> >        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >        at
> > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> >        at
> > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> > 2011-02-15 15:13:56,626 ERROR org.apache.hama.bsp.BSPPeer:
> > org.apache.zookeeper.KeeperException$ConnectionLossException:
> > KeeperErrorCode = ConnectionLoss for /bsp
> >
> > You can reproduce that by running PiEstimator (the newest source code
> from
> > svn) with small changes - put whole body of the bsp() method in the for
> > loop. So add in the beginning following line:
> >
> > for (int j = 0; j < 100; j++) {
> > // oryginal bsp() code
> > }
> >
> > When I'm trying to run it, the framowork hangs and mentioned before error
> > occures.
> >
> > Your help will be appreciated.
> >
> > Cheers,
> >
> > --
> > Pawel Brach
> >
>
>
>
> --
> ChiaHung Lin @ nuk, tw.
>



-- 
Paweł Brach

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message