hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: connecton loss exception
Date Wed, 16 Feb 2011 15:11:55 GMT
Thanks for nice report! I'll look at it tomorrow.

On Thu, Feb 17, 2011 at 12:00 AM, Paweł Brach <braszek@gmail.com> wrote:
> Unfortunately there are still some problems with communications.
> I didn't get any error likes connection loss exception, but I'm sending
> message with tag:
> byte[] tagName = Bytes.toBytes("TEST_TAG");
> and once (only!) during my experiment I received something like:
> String msgTag = Bytes.toString(received.getTag());
> // msgTag = "[B@56c163f"
>
> It looks like sometimes messages are corrupted.
>
> Cheers,
> Pawel
>
> PS. It could be great to see your benchmark results.
>
> 2011/2/16 Edward J. Yoon <edwardyoon@apache.org>
>
>> I decided to add a "random communication benchmark" tool. In this week
>> (or next week), I'll share with you my benchmarking experience. I have
>> 20 (160 cores) servers.
>>
>> Thanks.
>>
>> 2011/2/16 Edward J. Yoon <edward@udanax.org>:
>> > Looks like problem of sync. Can you try again it after add
>> Thread.sleep(100); line?
>> >
>> > Sent from my iPhone
>> >
>> > On 2011. 2. 16., at 오후 3:24, Paweł Brach <braszek@gmail.com> wrote:
>> >
>> >> Yes, I have of course. My cluster has been configured and both examples
>> >> PiEstimator and SerializePrinting work (there is communication between 3
>> >> nodes). I've modified your example  - PiEstimator (put everything in the
>> >> loop) and it works for few iterations (there is communication) and after
>> >> that connection is lost. After that connection is re-established but
>> some
>> >> messages are missing. It looks like that Hama framework is very unstable
>> >> when it's loaded and many messages are sending between nodes.
>> >> On the same cluster I've configured Apache Hadoop and it's very stable.
>> >> If you have own cluster configured, could you run my example on it ?
>> Have
>> >> you ever run something more complicated than PiEstimator and
>> >> SerializePrinting on it ?
>> >>
>> >> Cheers,
>> >> Pawel
>> >>
>> >> 2011/2/16 Chia-Hung Lin <clin4j@googlemail.com>
>> >>
>> >>> Have you configured zookeeper in hama-site.xml? Hama makes use of
>> >>> zookeeper to do node communication IIRC.
>> >>>
>> >>>   Opening socket connection to server cl5/127.0.1.1:2181
>> >>>
>> >>> indicates that seems only localhost is up.  If this is the case, you
>> >>> can change hama.zookeeper.quorum property pointing with value set to
>> >>> e.g.
>> >>>
>> >>> <property>
>> >>>   <name>hama.zookeeper.quorum</name>
>> >>>   <value>node1,node2,node3,node4,node5</value>
>> >>> </property>
>> >>>
>> >>> Hope it helps
>> >>>
>> >>> 2011/2/15 Paweł Brach <braszek@gmail.com>:
>> >>>> Hello,
>> >>>>
>> >>>> During last few days I've tested Hama solutions and today I found
some
>> >>>> strange error in Hama framework. If you run a simple job with more
>> than
>> >>> few
>> >>>> supersteps the following error occures:
>> >>>>
>> >>>> 2011-02-15 15:13:55,934 ERROR org.apache.hama.bsp.BSPPeer:
>> >>>> 2011-02-15 15:13:56,525 INFO org.apache.zookeeper.ClientCnxn: Opening
>> >>> socket
>> >>>> connection to server cl5/127.0.1.1:2181
>> >>>> 2011-02-15 15:13:56,526 WARN org.apache.zookeeper.ClientCnxn: Session
>> 0x0
>> >>>> for server null, unexpected error, closing socket connection and
>> >>> attempting
>> >>>> reconnect
>> >>>> java.net.ConnectException: Connection refused
>> >>>>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> >>>>       at
>> >>>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
>> >>>>       at
>> >>>> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
>> >>>> 2011-02-15 15:13:56,626 ERROR org.apache.hama.bsp.BSPPeer:
>> >>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> >>>> KeeperErrorCode = ConnectionLoss for /bsp
>> >>>>
>> >>>> You can reproduce that by running PiEstimator (the newest source
code
>> >>> from
>> >>>> svn) with small changes - put whole body of the bsp() method in
the
>> for
>> >>>> loop. So add in the beginning following line:
>> >>>>
>> >>>> for (int j = 0; j < 100; j++) {
>> >>>> // oryginal bsp() code
>> >>>> }
>> >>>>
>> >>>> When I'm trying to run it, the framowork hangs and mentioned before
>> error
>> >>>> occures.
>> >>>>
>> >>>> Your help will be appreciated.
>> >>>>
>> >>>> Cheers,
>> >>>>
>> >>>> --
>> >>>> Pawel Brach
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> ChiaHung Lin @ nuk, tw.
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Paweł Brach
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> http://blog.udanax.org
>> http://twitter.com/eddieyoon
>>
>
>
>
> --
> Paweł Brach
>



-- 
Best Regards, Edward J. Yoon
http://blog.udanax.org
http://twitter.com/eddieyoon

Mime
View raw message