hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paweł Brach <bras...@gmail.com>
Subject Re: connecton loss exception
Date Wed, 16 Feb 2011 14:47:59 GMT
I don't know what's going on but it works! Thread.sleep(100) helps !

Thanks,
Pawel

2011/2/16 Edward J. Yoon <edward@udanax.org>

> Looks like problem of sync. Can you try again it after add
> Thread.sleep(100); line?
>
> Sent from my iPhone
>
> On 2011. 2. 16., at 오후 3:24, Paweł Brach <braszek@gmail.com> wrote:
>
> > Yes, I have of course. My cluster has been configured and both examples
> > PiEstimator and SerializePrinting work (there is communication between 3
> > nodes). I've modified your example  - PiEstimator (put everything in the
> > loop) and it works for few iterations (there is communication) and after
> > that connection is lost. After that connection is re-established but some
> > messages are missing. It looks like that Hama framework is very unstable
> > when it's loaded and many messages are sending between nodes.
> > On the same cluster I've configured Apache Hadoop and it's very stable.
> > If you have own cluster configured, could you run my example on it ? Have
> > you ever run something more complicated than PiEstimator and
> > SerializePrinting on it ?
> >
> > Cheers,
> > Pawel
> >
> > 2011/2/16 Chia-Hung Lin <clin4j@googlemail.com>
> >
> >> Have you configured zookeeper in hama-site.xml? Hama makes use of
> >> zookeeper to do node communication IIRC.
> >>
> >>   Opening socket connection to server cl5/127.0.1.1:2181
> >>
> >> indicates that seems only localhost is up.  If this is the case, you
> >> can change hama.zookeeper.quorum property pointing with value set to
> >> e.g.
> >>
> >> <property>
> >>   <name>hama.zookeeper.quorum</name>
> >>   <value>node1,node2,node3,node4,node5</value>
> >> </property>
> >>
> >> Hope it helps
> >>
> >> 2011/2/15 Paweł Brach <braszek@gmail.com>:
> >>> Hello,
> >>>
> >>> During last few days I've tested Hama solutions and today I found some
> >>> strange error in Hama framework. If you run a simple job with more than
> >> few
> >>> supersteps the following error occures:
> >>>
> >>> 2011-02-15 15:13:55,934 ERROR org.apache.hama.bsp.BSPPeer:
> >>> 2011-02-15 15:13:56,525 INFO org.apache.zookeeper.ClientCnxn: Opening
> >> socket
> >>> connection to server cl5/127.0.1.1:2181
> >>> 2011-02-15 15:13:56,526 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x0
> >>> for server null, unexpected error, closing socket connection and
> >> attempting
> >>> reconnect
> >>> java.net.ConnectException: Connection refused
> >>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >>>       at
> >>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> >>>       at
> >>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> >>> 2011-02-15 15:13:56,626 ERROR org.apache.hama.bsp.BSPPeer:
> >>> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >>> KeeperErrorCode = ConnectionLoss for /bsp
> >>>
> >>> You can reproduce that by running PiEstimator (the newest source code
> >> from
> >>> svn) with small changes - put whole body of the bsp() method in the for
> >>> loop. So add in the beginning following line:
> >>>
> >>> for (int j = 0; j < 100; j++) {
> >>> // oryginal bsp() code
> >>> }
> >>>
> >>> When I'm trying to run it, the framowork hangs and mentioned before
> error
> >>> occures.
> >>>
> >>> Your help will be appreciated.
> >>>
> >>> Cheers,
> >>>
> >>> --
> >>> Pawel Brach
> >>>
> >>
> >>
> >>
> >> --
> >> ChiaHung Lin @ nuk, tw.
> >>
> >
> >
> >
> > --
> > Paweł Brach
>



-- 
Paweł Brach

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message