hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edw...@udanax.org>
Subject Re: connecton loss exception
Date Wed, 16 Feb 2011 11:30:42 GMT
Looks like problem of sync. Can you try again it after add Thread.sleep(100); line?

Sent from my iPhone

On 2011. 2. 16., at 오후 3:24, Paweł Brach <braszek@gmail.com> wrote:

> Yes, I have of course. My cluster has been configured and both examples
> PiEstimator and SerializePrinting work (there is communication between 3
> nodes). I've modified your example  - PiEstimator (put everything in the
> loop) and it works for few iterations (there is communication) and after
> that connection is lost. After that connection is re-established but some
> messages are missing. It looks like that Hama framework is very unstable
> when it's loaded and many messages are sending between nodes.
> On the same cluster I've configured Apache Hadoop and it's very stable.
> If you have own cluster configured, could you run my example on it ? Have
> you ever run something more complicated than PiEstimator and
> SerializePrinting on it ?
> 
> Cheers,
> Pawel
> 
> 2011/2/16 Chia-Hung Lin <clin4j@googlemail.com>
> 
>> Have you configured zookeeper in hama-site.xml? Hama makes use of
>> zookeeper to do node communication IIRC.
>> 
>>   Opening socket connection to server cl5/127.0.1.1:2181
>> 
>> indicates that seems only localhost is up.  If this is the case, you
>> can change hama.zookeeper.quorum property pointing with value set to
>> e.g.
>> 
>> <property>
>>   <name>hama.zookeeper.quorum</name>
>>   <value>node1,node2,node3,node4,node5</value>
>> </property>
>> 
>> Hope it helps
>> 
>> 2011/2/15 Paweł Brach <braszek@gmail.com>:
>>> Hello,
>>> 
>>> During last few days I've tested Hama solutions and today I found some
>>> strange error in Hama framework. If you run a simple job with more than
>> few
>>> supersteps the following error occures:
>>> 
>>> 2011-02-15 15:13:55,934 ERROR org.apache.hama.bsp.BSPPeer:
>>> 2011-02-15 15:13:56,525 INFO org.apache.zookeeper.ClientCnxn: Opening
>> socket
>>> connection to server cl5/127.0.1.1:2181
>>> 2011-02-15 15:13:56,526 WARN org.apache.zookeeper.ClientCnxn: Session 0x0
>>> for server null, unexpected error, closing socket connection and
>> attempting
>>> reconnect
>>> java.net.ConnectException: Connection refused
>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>>       at
>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>>>       at
>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
>>> 2011-02-15 15:13:56,626 ERROR org.apache.hama.bsp.BSPPeer:
>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>> KeeperErrorCode = ConnectionLoss for /bsp
>>> 
>>> You can reproduce that by running PiEstimator (the newest source code
>> from
>>> svn) with small changes - put whole body of the bsp() method in the for
>>> loop. So add in the beginning following line:
>>> 
>>> for (int j = 0; j < 100; j++) {
>>> // oryginal bsp() code
>>> }
>>> 
>>> When I'm trying to run it, the framowork hangs and mentioned before error
>>> occures.
>>> 
>>> Your help will be appreciated.
>>> 
>>> Cheers,
>>> 
>>> --
>>> Pawel Brach
>>> 
>> 
>> 
>> 
>> --
>> ChiaHung Lin @ nuk, tw.
>> 
> 
> 
> 
> -- 
> Paweł Brach

Mime
View raw message