hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paweł Brach <bras...@gmail.com>
Subject Re: connecton loss exception
Date Wed, 16 Feb 2011 15:00:33 GMT
Unfortunately there are still some problems with communications.
I didn't get any error likes connection loss exception, but I'm sending
message with tag:
byte[] tagName = Bytes.toBytes("TEST_TAG");
and once (only!) during my experiment I received something like:
String msgTag = Bytes.toString(received.getTag());
// msgTag = "[B@56c163f"

It looks like sometimes messages are corrupted.

Cheers,
Pawel

PS. It could be great to see your benchmark results.

2011/2/16 Edward J. Yoon <edwardyoon@apache.org>

> I decided to add a "random communication benchmark" tool. In this week
> (or next week), I'll share with you my benchmarking experience. I have
> 20 (160 cores) servers.
>
> Thanks.
>
> 2011/2/16 Edward J. Yoon <edward@udanax.org>:
> > Looks like problem of sync. Can you try again it after add
> Thread.sleep(100); line?
> >
> > Sent from my iPhone
> >
> > On 2011. 2. 16., at 오후 3:24, Paweł Brach <braszek@gmail.com> wrote:
> >
> >> Yes, I have of course. My cluster has been configured and both examples
> >> PiEstimator and SerializePrinting work (there is communication between 3
> >> nodes). I've modified your example  - PiEstimator (put everything in the
> >> loop) and it works for few iterations (there is communication) and after
> >> that connection is lost. After that connection is re-established but
> some
> >> messages are missing. It looks like that Hama framework is very unstable
> >> when it's loaded and many messages are sending between nodes.
> >> On the same cluster I've configured Apache Hadoop and it's very stable.
> >> If you have own cluster configured, could you run my example on it ?
> Have
> >> you ever run something more complicated than PiEstimator and
> >> SerializePrinting on it ?
> >>
> >> Cheers,
> >> Pawel
> >>
> >> 2011/2/16 Chia-Hung Lin <clin4j@googlemail.com>
> >>
> >>> Have you configured zookeeper in hama-site.xml? Hama makes use of
> >>> zookeeper to do node communication IIRC.
> >>>
> >>>   Opening socket connection to server cl5/127.0.1.1:2181
> >>>
> >>> indicates that seems only localhost is up.  If this is the case, you
> >>> can change hama.zookeeper.quorum property pointing with value set to
> >>> e.g.
> >>>
> >>> <property>
> >>>   <name>hama.zookeeper.quorum</name>
> >>>   <value>node1,node2,node3,node4,node5</value>
> >>> </property>
> >>>
> >>> Hope it helps
> >>>
> >>> 2011/2/15 Paweł Brach <braszek@gmail.com>:
> >>>> Hello,
> >>>>
> >>>> During last few days I've tested Hama solutions and today I found some
> >>>> strange error in Hama framework. If you run a simple job with more
> than
> >>> few
> >>>> supersteps the following error occures:
> >>>>
> >>>> 2011-02-15 15:13:55,934 ERROR org.apache.hama.bsp.BSPPeer:
> >>>> 2011-02-15 15:13:56,525 INFO org.apache.zookeeper.ClientCnxn: Opening
> >>> socket
> >>>> connection to server cl5/127.0.1.1:2181
> >>>> 2011-02-15 15:13:56,526 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x0
> >>>> for server null, unexpected error, closing socket connection and
> >>> attempting
> >>>> reconnect
> >>>> java.net.ConnectException: Connection refused
> >>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> >>>>       at
> >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
> >>>>       at
> >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> >>>> 2011-02-15 15:13:56,626 ERROR org.apache.hama.bsp.BSPPeer:
> >>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
> >>>> KeeperErrorCode = ConnectionLoss for /bsp
> >>>>
> >>>> You can reproduce that by running PiEstimator (the newest source code
> >>> from
> >>>> svn) with small changes - put whole body of the bsp() method in the
> for
> >>>> loop. So add in the beginning following line:
> >>>>
> >>>> for (int j = 0; j < 100; j++) {
> >>>> // oryginal bsp() code
> >>>> }
> >>>>
> >>>> When I'm trying to run it, the framowork hangs and mentioned before
> error
> >>>> occures.
> >>>>
> >>>> Your help will be appreciated.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> --
> >>>> Pawel Brach
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> ChiaHung Lin @ nuk, tw.
> >>>
> >>
> >>
> >>
> >> --
> >> Paweł Brach
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> http://blog.udanax.org
> http://twitter.com/eddieyoon
>



-- 
Paweł Brach

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message