From user-return-1418-apmail-hama-user-archive=hama.apache.org@hama.apache.org Sun Aug 2 23:07:54 2015 Return-Path: X-Original-To: apmail-hama-user-archive@www.apache.org Delivered-To: apmail-hama-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD6DE18070 for ; Sun, 2 Aug 2015 23:07:54 +0000 (UTC) Received: (qmail 97170 invoked by uid 500); 2 Aug 2015 23:07:54 -0000 Delivered-To: apmail-hama-user-archive@hama.apache.org Received: (qmail 97135 invoked by uid 500); 2 Aug 2015 23:07:54 -0000 Mailing-List: contact user-help@hama.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hama.apache.org Delivered-To: mailing list user@hama.apache.org Received: (qmail 97113 invoked by uid 99); 2 Aug 2015 23:07:54 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Aug 2015 23:07:54 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 13BCF19803C for ; Sun, 2 Aug 2015 23:07:54 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.991 X-Spam-Level: X-Spam-Status: No, score=0.991 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id WHn7oriF9ps7 for ; Sun, 2 Aug 2015 23:07:39 +0000 (UTC) Received: from mailout4.samsung.com (mailout4.samsung.com [203.254.224.34]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id AC1F0212E0 for ; Sun, 2 Aug 2015 23:07:37 +0000 (UTC) Received: from epcpsbgm2new.samsung.com (epcpsbgm2 [203.254.230.27]) by mailout4.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTP id <0NSH00KK69KCXC30@mailout4.samsung.com> for user@hama.apache.org; Mon, 03 Aug 2015 08:07:27 +0900 (KST) X-AuditID: cbfee61b-f79706d000001b96-55-55bea2af5bb1 Received: from epmmp2 ( [203.254.227.17]) by epcpsbgm2new.samsung.com (EPCPMTA) with SMTP id D1.0E.07062.FA2AEB55; Mon, 3 Aug 2015 08:07:27 +0900 (KST) Received: from secPC ([10.251.52.188]) by mmp2.samsung.com (Oracle Communications Messaging Server 7.0.5.31.0 64bit (built May 5 2014)) with ESMTPA id <0NSH00C929KFZS20@mmp2.samsung.com> for user@hama.apache.org; Mon, 03 Aug 2015 08:07:27 +0900 (KST) Sun-Java-System-SMTP-Warning: Lines longer than SMTP allows found and wrapped. From: "Edward J. Yoon" To: user@hama.apache.org References: <003201d0b1f5$323ce5a0$96b6b0e0$@samsung.com> <003601d0b1fc$5694ed60$03bec820$@samsung.com> <"CAAp_xXEGP YWXiQ2sV1gQCbAZBH8Ve9BreY1NHMtJR9-Ppe4iHQ"@mail.gmail.com> In-reply-to: Subject: RE: Groomserer BSPPeerChild limit Date: Mon, 03 Aug 2015 08:07:28 +0900 Message-id: <004801d0cd78$00ed36a0$02c7a3e0$@samsung.com> X-Mailer: Microsoft Outlook 14.0 Thread-index: AQEuof1kAr2z2+loLbeNhpbWlNmVJwKJjcwBAkW5i4QCVMScJAF1rrasAphS+N0B/wdWTAI3OoIAAmPqzNUCQqIbzgEt6RRQAUd5LKMBL2dRhQIAB0FOAYSUGlsCIYhjRwFWmUvqA1tVD/MB0miRgAJFE1cHAj1+z0YB4PWB8wHYu3p4ndzYcLA= Content-language: ko X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFuplluLIzCtJLcpLzFFi42I5/e+xoO76RftCDaZ/UrXYe20mmwOjx7vD jYwBjFFcNimpOZllqUX6dglcGcd+72UreN3OVPF22WnmBsajpxm7GDk5JARMJFb0nmOCsMUk Ltxbz9bFyMUhJDCLUeL7k2ZWCKeRSeLW8wMsIFXCAn4STXMmMYPYbAIGEmsXrQbrFhGQkDjy eiJU90Eeia7jN4EcDg5OgWCJPxdcIXq1JI7ceA3WyyKgKvH96UuwXl4BS4n1vS0sEFcoSOw4 +5oRZI6IwAlGiSk//rKBJJgFRCT2vXjHOIGRfxYSdwEj4ypGidSC5ILipPRco7zUcr3ixNzi 0rx0veT83E2M4EB6Jr2D8fAu90OMAhyMSjy8HxbsCxViTSwrrsw9xCjBwawkwnsxHSjEm5JY WZValB9fVJqTWnyIUZqDRUmcV99kU6iQQHpiSWp2ampBahFMlomDU6qBke3BnKOnXnCp1k29 tSN4vyB7yuONulvd7GJtWaKPPw5TMcmf8SvuboZk5jfxi4KB/yddaFu8O/fTSYV5aT+abpqK TFr7Os/9yaSnyyKrrE3+P5fi8LznPmHb0iV7pXNSrMtPJi6p9J1Z3q5xK+G0wL/YjjamVps1 m6P2bpjfdmbVUp697w/snKzEUpyRaKjFXFScCABHEvaXIAIAAA== Hi, Congratz! You can shutdown the cluster with following command: $ bin/stop-bspd.sh -- Best Regards, Edward J. Yoon -----Original Message----- From: Behroz Sikander [mailto:behroz89@gmail.com] Sent: Sunday, August 02, 2015 11:27 PM To: user@hama.apache.org Subject: Re: Groomserer BSPPeerChild limit Hi, Last day, I got the fix for /etc/hosts file and now I can modify it. I tried to run the cluster with 3 machines and everything went super fine. Thanks :) btw if I run a process using the following. How can I stop it ? Right now I am using kill -9 % ./bin/hama bspmaster On Mon, Jun 29, 2015 at 5:53 AM, Behroz Sikander wrote: > Ok perfect. I do not have rights on /etc/hosts so that's why I was using > the IP addresses. I will talk to the administrator. > > Btw I am wondering, how PI example was able to communicate with the other > servers. PI examples runs fine even if I have tasks more than 3 (works on > both machines). > > On Mon, Jun 29, 2015 at 5:47 AM, Edward J. Yoon > wrote: > >> OKay almost done. I guess you need to add host names to your >> /etc/hosts file. :-) Please see also >> >> http://stackoverflow.com/questions/4730148/unknownhostexception-on-tasktracker-in-hadoop-cluster >> >> On Mon, Jun 29, 2015 at 12:41 PM, Behroz Sikander >> wrote: >> > Server 2 was showing the exception that I posted in the previous email. >> > Server1 is showing the following exception >> > >> > 15/06/29 03:27:42 INFO ipc.Server: IPC Server handler 0 on 40000: >> starting >> > 15/06/29 03:28:53 INFO bsp.BSPMaster: groomd_b178b33b16cc_50000 is >> added. >> > 15/06/29 03:29:20 ERROR bsp.BSPMaster: Fail to register GroomServer >> > groomd_8d4b512cf448_50000 >> > java.net.UnknownHostException: unknown host: 8d4b512cf448 >> > at org.apache.hama.ipc.Client$Connection.(Client.java:225) >> > at org.apache.hama.ipc.Client.getConnection(Client.java:1039) >> > at org.apache.hama.ipc.Client.call(Client.java:888) >> > at org.apache.hama.ipc.RPC$Invoker.invoke(RPC.java:239) >> > at com.sun.proxy.$Proxy11.getProtocolVersion(Unknown Source) >> > >> > I am looking into this issue. >> > >> > On Mon, Jun 29, 2015 at 5:31 AM, Behroz Sikander >> wrote: >> > >> >> Ok great. I was able to run the zk, groom and bspmaster on server 1. >> But >> >> when I ran the groom on server2 I got the following exception >> >> >> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: There is a problem in >> >> establishing communication link with BSPMaster >> >> 15/06/29 03:29:20 ERROR bsp.GroomServer: Got fatal exception while >> >> reinitializing GroomServer: java.io.IOException: There is a problem in >> >> establishing communication link with BSPMaster. >> >> at org.apache.hama.bsp.GroomServer.initialize(GroomServer.java:426) >> >> at org.apache.hama.bsp.GroomServer.run(GroomServer.java:860) >> >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> On Mon, Jun 29, 2015 at 5:21 AM, Edward J. Yoon > > >> >> wrote: >> >> >> >>> Here's my configurations: >> >>> >> >>> hama-site.xml: >> >>> >> >>> >> >>> bsp.master.address >> >>> cluster-0:40000 >> >>> >> >>> >> >>> >> >>> fs.default.name >> >>> hdfs://cluster-0:9000/ >> >>> >> >>> >> >>> >> >>> hama.zookeeper.quorum >> >>> cluster-0 >> >>> >> >>> >> >>> >> >>> % bin/hama zookeeper >> >>> 15/06/29 12:17:17 ERROR quorum.QuorumPeerConfig: Invalid >> >>> configuration, only one server specified (ignoring) >> >>> >> >>> Then, open new terminal and run master with following command: >> >>> >> >>> % bin/hama bspmaster >> >>> ... >> >>> 15/06/29 12:17:40 INFO sync.ZKSyncBSPMasterClient: Initialized ZK >> false >> >>> 15/06/29 12:17:40 INFO sync.ZKSyncClient: Initializing ZK Sync Client >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server Responder: starting >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server listener on 40000: >> starting >> >>> 15/06/29 12:17:40 INFO ipc.Server: IPC Server handler 0 on 40000: >> starting >> >>> 15/06/29 12:17:40 INFO bsp.BSPMaster: Starting RUNNING >> >>> >> >>> >> >>> >> >>> On Mon, Jun 29, 2015 at 12:17 PM, Edward J. Yoon < >> edwardyoon@apache.org> >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > If you run zk server too, BSPmaster will be connected to zk and >> won't >> >>> > throw exceptions. >> >>> > >> >>> > On Mon, Jun 29, 2015 at 12:13 PM, Behroz Sikander < >> behroz89@gmail.com> >> >>> wrote: >> >>> >> Hi, >> >>> >> Thank you the information. I moved to hama 0.7.0 and I still have >> the >> >>> same >> >>> >> problem. >> >>> >> When I run % bin/hama bspmaster, I am getting the following >> exception >> >>> >> >> >>> >> INFO http.HttpServer: Port returned by >> >>> >> webServer.getConnectors()[0].getLocalPort() before open() is -1. >> >>> Opening >> >>> >> the listener on 40013 >> >>> >> INFO http.HttpServer: listener.getLocalPort() returned 40013 >> >>> >> webServer.getConnectors()[0].getLocalPort() returned 40013 >> >>> >> INFO http.HttpServer: Jetty bound to port 40013 >> >>> >> INFO mortbay.log: jetty-6.1.14 >> >>> >> INFO mortbay.log: Extract >> >>> >> >> >>> >> jar:file:/home/behroz/Documents/Packages/hama-0.7.0/hama-core-0.7.0.jar!/webapp/bspmaster/ >> >>> >> to /tmp/Jetty_b178b33b16cc_40013_bspmaster____.cof30w/webapp >> >>> >> INFO mortbay.log: Started SelectChannelConnector@b178b33b16cc >> :40013 >> >>> >> INFO bsp.BSPMaster: Cleaning up the system directory >> >>> >> INFO bsp.BSPMaster: hdfs:// >> >>> 172.17.0.3:54310/tmp/hama-behroz/bsp/system >> >>> >> INFO sync.ZKSyncBSPMasterClient: Initialized ZK false >> >>> >> INFO sync.ZKSyncClient: Initializing ZK Sync Client >> >>> >> ERROR sync.ZKSyncBSPMasterClient: >> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> >>> >> KeeperErrorCode = ConnectionLoss for /bsp >> >>> >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:99) >> >>> >> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041) >> >>> >> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1069) >> >>> >> at >> >>> >> >> >>> >> org.apache.hama.bsp.sync.ZKSyncBSPMasterClient.init(ZKSyncBSPMasterClient.java:62) >> >>> >> at org.apache.hama.bsp.BSPMaster.initZK(BSPMaster.java:534) >> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:517) >> >>> >> at org.apache.hama.bsp.BSPMaster.startMaster(BSPMaster.java:500) >> >>> >> at org.apache.hama.BSPMasterRunner.run(BSPMasterRunner.java:46) >> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> >>> >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) >> >>> >> at org.apache.hama.BSPMasterRunner.main(BSPMasterRunner.java:56) >> >>> >> ERROR sync.ZKSyncBSPMasterClient: >> >>> >> org.apache.zookeeper.KeeperException$ConnectionLossException: >> >>> >> KeeperErrorCode = ConnectionLoss for /bsp >> >>> >> >> >>> >> *Why zookeeper settings in hama-site.xml are (right now, I am using >> >>> just >> >>> >> two servers 172.17.0.3 and 172.17.0.7)* >> >>> >> >> >>> >> hama.zookeeper.quorum >> >>> >> 172.17.0.3,172.17.0.7 >> >>> >> Comma separated list of servers in >> the >> >>> >> ZooKeeper quorum. >> >>> >> For example, "host1.mydomain.com, >> host2.mydomain.com, >> >>> >> host3.mydomain.com". >> >>> >> By default this is set to localhost for local and >> >>> >> pseudo-distributed modes >> >>> >> of operation. For a fully-distributed setup, this >> >>> should >> >>> >> be set to a full >> >>> >> list of ZooKeeper quorum servers. If >> HAMA_MANAGES_ZK >> >>> is >> >>> >> set in hama-env.sh >> >>> >> this is the list of servers which we will >> start/stop >> >>> >> ZooKeeper on. >> >>> >> >> >>> >> >> >>> >> ...... >> >>> >> >> >>> >> hama.zookeeper.property.clientPort >> >>> >> 2181 >> >>> >> >> >>> >> >> >>> >> Is something wrong with my settings ? >> >>> >> >> >>> >> Regards, >> >>> >> Behroz Sikander >> >>> >> >> >>> >> On Mon, Jun 29, 2015 at 1:44 AM, Edward J. Yoon < >> >>> edward.yoon@samsung.com> >> >>> >> wrote: >> >>> >> >> >>> >>> > (0.7.0) because I do not understand YARN yet. It adds extra >> >>> >>> configurations >> >>> >>> >> >>> >>> Hama classic mode works on both Hadoop 1.x and Hadoop 2.x HDFS. >> Yarn >> >>> >>> configuration is only needed when you want to submit a BSP job to >> Yarn >> >>> >>> cluster >> >>> >>> without Hama cluster. So you don't need to worry about it. :-) >> >>> >>> >> >>> >>> > distributed mode ? and is there any way to manage the server ? I >> >>> mean >> >>> >>> right >> >>> >>> > now, I have 3 machines with alot of configurations files and log >> >>> files. >> >>> >>> It >> >>> >>> >> >>> >>> You can use web UI at >> http://masterserver_address:40013/bspmaster.jsp >> >>> >>> >> >>> >>> To debug your program, please try like below: >> >>> >>> >> >>> >>> 1) Run a BSPMaster and Zookeeper at server1. >> >>> >>> % bin/hama bspmaster >> >>> >>> % bin/hama zookeeper >> >>> >>> >> >>> >>> 2) Run a Groom at server1 and server2. >> >>> >>> >> >>> >>> % bin/hama groom >> >>> >>> >> >>> >>> 3) Check whether deamons are running well. Then, run your program >> >>> using jar >> >>> >>> command at server1. >> >>> >>> >> >>> >>> % bin/hama jar ..... >> >>> >>> >> >>> >>> > In hama_[user]_bspmaster_.....log file I get the following >> >>> exception. But >> >>> >>> > this occurs in both cases when I run my job with 3 tasks or >> with 4 >> >>> tasks >> >>> >>> >> >>> >>> In fact, you should not see above initZK error log. >> >>> >>> >> >>> >>> -- >> >>> >>> Best Regards, Edward J. Yoon >> >>> >>> >> >>> >>> >> >>> >>> -----Original Message----- >> >>> >>> From: Behroz Sikander [mailto:behroz89@gmail.com] >> >>> >>> Sent: Monday, June 29, 2015 8:18 AM >> >>> >>> To: user@hama.apache.org >> >>> >>> Subject: Re: Groomserer BSPPeerChild limit >> >>> >>> >> >>> >>> I will try the things that you mentioned. I am not using the >> latest >> >>> version >> >>> >>> (0.7.0) because I do not understand YARN yet. It adds extra >> >>> configurations >> >>> >>> which makes it more harder for me to understand when things go >> wrong. >> >>> Any >> >>> >>> suggestions ? >> >>> >>> >> >>> >>> Further, are there any tools that you use for debugging while in >> >>> >>> distributed mode ? and is there any way to manage the server ? I >> mean >> >>> right >> >>> >>> now, I have 3 machines with alot of configurations files and log >> >>> files. It >> >>> >>> takes alot of time. This makes me wonder how people who have 100s >> of >> >>> >>> machines debug and manage the cluster. >> >>> >>> >> >>> >>> Regards, >> >>> >>> Behroz >> >>> >>> >> >>> >>> On Mon, Jun 29, 2015 at 12:53 AM, Edward J. Yoon < >> >>> edward.yoon@samsung.com> >> >>> >>> wrote: >> >>> >>> >> >>> >>> > Hi, >> >>> >>> > >> >>> >>> > It looks like a zookeeper connection problem. Please check >> whether >> >>> >>> > zookeeper >> >>> >>> > is running and every tasks can connect to zookeeper. >> >>> >>> > >> >>> >>> > I would recommend you to stop the firewall during debugging, and >> >>> please >> >>> >>> use >> >>> >>> > the 0.7.0 latest release. >> >>> >>> > >> >>> >>> > >> >>> >>> > -- >> >>> >>> > Best Regards, Edward J. Yoon >> >>> >>> > >> >>> >>> > -----Original Message----- >> >>> >>> > From: Behroz Sikander [mailto:behroz89@gmail.com] >> >>> >>> > Sent: Monday, June 29, 2015 7:34 AM >> >>> >>> > To: user@hama.apache.org >> >>> >>> > Subject: Re: Groomserer BSPPeerChild limit >> >>> >>> > >> >>> >>> > To figure out the issue, I was trying something else and found >> out >> >>> >>> another >> >>> >>> > wiered issue. Might be a bug of Hama but I am not sure. Both >> >>> following >> >>> >>> > lines give an exception. >> >>> >>> > >> >>> >>> > System.out.println( peer.getPeerName(0)); //Exception >> >>> >>> > >> >>> >>> > System.out.println( peer.getNumPeers()); //Exception >> >>> >>> > >> >>> >>> > >> >>> >>> > [time] ERROR bsp.BSPTask: *Error running bsp setup and bsp >> >>> function.* >> >>> >>> > >> >>> >>> > [time]java.lang.*RuntimeException: All peer names could not be >> >>> >>> retrieved!* >> >>> >>> > >> >>> >>> > at >> >>> >>> > >> >>> >>> > >> >>> >>> >> >>> >> org.apache.hama.bsp.sync.ZooKeeperSyncClientImpl.getAllPeerNames(ZooKeeperSyncClientImpl.java:305) >> >>> >>> > >> >>> >>> > at >> >>> org.apache.hama.bsp.BSPPeerImpl.initPeerNames(BSPPeerImpl.java:544) >> >>> >>> > >> >>> >>> > at >> org.apache.hama.bsp.BSPPeerImpl.getNumPeers(BSPPeerImpl.java:538) >> >>> >>> > >> >>> >>> > at testHDFS.EVADMMBsp.setup*(EVADMMBsp.java:58)* >> >>> >>> > >> >>> >>> > at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:170) >> >>> >>> > >> >>> >>> > at org.apache.hama.bsp.BSPTask.run(BSPTask.java:144) >> >>> >>> > >> >>> >>> > at >> >>> >>> >> >>> >> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1243) >> >>> >>> > >> >>> >>> > On Sun, Jun 28, 2015 at 6:45 PM, Behroz Sikander < >> >>> behroz89@gmail.com> >> >>> >>> > wrote: >> >>> >>> > >> >>> >>> > > I think I have more information on the issue. I did some >> >>> debugging and >> >>> >>> > > found something quite strange. >> >>> >>> > > >> >>> >>> > > If I open my job with 6 tasks ( 3 tasks will run on MACHINE1 >> and >> >>> 3 task >> >>> >>> > > will be opened on other MACHINE2), >> >>> >>> > > >> >>> >>> > > - 3 tasks on Machine1 are frozen and the strange thing is >> that >> >>> the >> >>> >>> > > processes do not even enter the SETUP function of BSP class. I >> >>> have >> >>> >>> print >> >>> >>> > > statements in the setup function of BSP class and it doesn't >> print >> >>> >>> > > anything. I get empty files with zero size. >> >>> >>> > > >> >>> >>> > > drwxrwxr-x 2 behroz behroz 4096 Jun 28 16:29 . >> >>> >>> > > drwxrwxr-x 99 behroz behroz 4096 Jun 28 16:28 .. >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> >>> > > attempt_201506281624_0001_000000_0.err >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> >>> > > attempt_201506281624_0001_000000_0.log >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> >>> > > attempt_201506281624_0001_000001_0.err >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> >>> > > attempt_201506281624_0001_000001_0.log >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> >>> > > attempt_201506281624_0001_000002_0.err >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 0 Jun 28 16:24 >> >>> >>> > > attempt_201506281624_0001_000002_0.log >> >>> >>> > > >> >>> >>> > > - On MACHINE2, the code enters the SETUP function of BSP >> class and >> >>> >>> prints >> >>> >>> > > stuff. See the size of files generated on output. How is it >> >>> possible >> >>> >>> that >> >>> >>> > > in 3 tasks the code can enter BSP and in others it cannot ? >> >>> >>> > > >> >>> >>> > > drwxrwxr-x 2 behroz behroz 4096 Jun 28 16:39 . >> >>> >>> > > drwxrwxr-x 82 behroz behroz 4096 Jun 28 16:39 .. >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 >> >>> >>> > > attempt_201506281639_0001_000003_0.err >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 1441 Jun 28 16:39 >> >>> >>> > > attempt_201506281639_0001_000003_0.log >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 >> >>> >>> > > attempt_201506281639_0001_000004_0.err >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 1368 Jun 28 16:39 >> >>> >>> > > attempt_201506281639_0001_000004_0.log >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 659 Jun 28 16:39 >> >>> >>> > > attempt_201506281639_0001_000005_0.err >> >>> >>> > > -rw-rw-r-- 1 behroz behroz 1441 Jun 28 16:39 >> >>> >>> > > attempt_201506281639_0001_000005_0.log >> >>> >>> > > >> >>> >>> > > - Hama Groom log file on MACHINE2 (which is frozen) shows. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> >>> > > 'attempt_201506281639_0001_000001_0' has started. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> >>> > > 'attempt_201506281639_0001_000002_0' has started. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> >>> > > 'attempt_201506281639_0001_000000_0' has started. >> >>> >>> > > >> >>> >>> > > - Hama Groom log file on MACHINE2 shows >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> >>> > > 'attempt_201506281639_0001_000003_0' has started. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> >>> > > 'attempt_201506281639_0001_000004_0' has started. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Launch 3 tasks. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> >>> > > 'attempt_201506281639_0001_000005_0' has started. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> >>> > > attempt_201506281639_0001_000004_0 is *done*. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> >>> > > attempt_201506281639_0001_000003_0 is *done*. >> >>> >>> > > [time] INFO org.apache.hama.bsp.GroomServer: Task >> >>> >>> > > attempt_201506281639_0001_000005_0 is *done*. >> >>> >>> > > >> >>> >>> > > Any clue what might be going wrong ? >> >>> >>> > > >> >>> >>> > > Regards, >> >>> >>> > > Behroz >> >>> >>> > > >> >>> >>> > > >> >>> >>> > > >> >>> >>> > > On Sat, Jun 27, 2015 at 1:13 PM, Behroz Sikander < >> >>> behroz89@gmail.com> >> >>> >>> > > wrote: >> >>> >>> > > >> >>> >>> > >> Here is the log file from that folder >> >>> >>> > >> >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: Starting Socket Reader #1 >> for >> >>> port >> >>> >>> > >> 61001 >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server Responder: >> starting >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server listener on >> 61001: >> >>> >>> > starting >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 0 on >> 61001: >> >>> >>> > starting >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 1 on >> 61001: >> >>> >>> > starting >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 2 on >> 61001: >> >>> >>> > starting >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 3 on >> 61001: >> >>> >>> > starting >> >>> >>> > >> 15/06/27 11:10:34 INFO message.HamaMessageManagerImpl: >> BSPPeer >> >>> >>> > >> address:b178b33b16cc port:61001 >> >>> >>> > >> 15/06/27 11:10:34 INFO ipc.Server: IPC Server handler 4 on >> 61001: >> >>> >>> > starting >> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZKSyncClient: Initializing ZK >> Sync >> >>> Client >> >>> >>> > >> 15/06/27 11:10:34 INFO sync.ZooKeeperSyncClientImpl: Start >> >>> connecting >> >>> >>> to >> >>> >>> > >> Zookeeper! At b178b33b16cc/172.17.0.7:61001 >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping server on 61001 >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 0 on >> 61001: >> >>> >>> > exiting >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server >> listener >> >>> on >> >>> >>> 61001 >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 1 on >> 61001: >> >>> >>> > exiting >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 2 on >> 61001: >> >>> >>> > exiting >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: Stopping IPC Server >> Responder >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 3 on >> 61001: >> >>> >>> > exiting >> >>> >>> > >> 15/06/27 11:10:37 INFO ipc.Server: IPC Server handler 4 on >> 61001: >> >>> >>> > exiting >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> > >> And my console shows the following ouptut. Hama is frozen >> right >> >>> now. >> >>> >>> > >> 15/06/27 11:10:32 INFO bsp.BSPJobClient: Running job: >> >>> >>> > >> job_201506262331_0003 >> >>> >>> > >> 15/06/27 11:10:35 INFO bsp.BSPJobClient: Current supersteps >> >>> number: 0 >> >>> >>> > >> 15/06/27 11:10:38 INFO bsp.BSPJobClient: Current supersteps >> >>> number: 2 >> >>> >>> > >> >> >>> >>> > >> On Sat, Jun 27, 2015 at 1:07 PM, Edward J. Yoon < >> >>> >>> edwardyoon@apache.org> >> >>> >>> > >> wrote: >> >>> >>> > >> >> >>> >>> > >>> Please check the task logs in $HAMA_HOME/logs/tasklogs >> folder. >> >>> >>> > >>> >> >>> >>> > >>> On Sat, Jun 27, 2015 at 8:03 PM, Behroz Sikander < >> >>> behroz89@gmail.com >> >>> >>> > >> >>> >>> > >>> wrote: >> >>> >>> > >>> > Yea. I also thought that. I ran the program through >> eclipse >> >>> with 20 >> >>> >>> > >>> tasks >> >>> >>> > >>> > and it works fine. >> >>> >>> > >>> > >> >>> >>> > >>> > On Sat, Jun 27, 2015 at 1:00 PM, Edward J. Yoon < >> >>> >>> > edwardyoon@apache.org >> >>> >>> > >>> > >> >>> >>> > >>> > wrote: >> >>> >>> > >>> > >> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs >> fine. >> >>> When I >> >>> >>> > >>> run my >> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I >> >>> increase >> >>> >>> > the >> >>> >>> > >>> tasks >> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not >> >>> >>> understand >> >>> >>> > >>> what >> >>> >>> > >>> >> can >> >>> >>> > >>> >> > go wrong. >> >>> >>> > >>> >> >> >>> >>> > >>> >> It looks like a program bug. Have you ran your program in >> >>> local >> >>> >>> > mode? >> >>> >>> > >>> >> >> >>> >>> > >>> >> On Sat, Jun 27, 2015 at 8:03 AM, Behroz Sikander < >> >>> >>> > behroz89@gmail.com> >> >>> >>> > >>> >> wrote: >> >>> >>> > >>> >> > Hi, >> >>> >>> > >>> >> > In the current thread, I mentioned 3 issues. Issue 1 >> and 3 >> >>> are >> >>> >>> > >>> resolved >> >>> >>> > >>> >> but >> >>> >>> > >>> >> > issue number 2 is still giving me headaches. >> >>> >>> > >>> >> > >> >>> >>> > >>> >> > My problem: >> >>> >>> > >>> >> > My cluster now consists of 3 machines. Each one of them >> >>> properly >> >>> >>> > >>> >> configured >> >>> >>> > >>> >> > (Apparently). From my master machine when I start >> Hadoop >> >>> and >> >>> >>> Hama, >> >>> >>> > >>> I can >> >>> >>> > >>> >> > see the processes started on other 2 machines. If I >> check >> >>> the >> >>> >>> > >>> maximum >> >>> >>> > >>> >> tasks >> >>> >>> > >>> >> > that my cluster can support then I get 9 (3 tasks on >> each >> >>> >>> > machine). >> >>> >>> > >>> >> > >> >>> >>> > >>> >> > When I run the PI example, it uses 9 tasks and runs >> fine. >> >>> When I >> >>> >>> > >>> run my >> >>> >>> > >>> >> > program with 3 tasks, everything runs fine. But when I >> >>> increase >> >>> >>> > the >> >>> >>> > >>> tasks >> >>> >>> > >>> >> > (to 4) by using "setNumBspTask". Hama freezes. I do not >> >>> >>> understand >> >>> >>> > >>> what >> >>> >>> > >>> >> can >> >>> >>> > >>> >> > go wrong. >> >>> >>> > >>> >> > >> >>> >>> > >>> >> > I checked the logs files and things look fine. I just >> >>> sometimes >> >>> >>> > get >> >>> >>> > >>> an >> >>> >>> > >>> >> > exception that hama was not able to delete the sytem >> >>> directory >> >>> >>> > >>> >> > (bsp.system.dir) defined in the hama-site.xml. >> >>> >>> > >>> >> > >> >>> >>> > >>> >> > Any help or clue would be great. >> >>> >>> > >>> >> > >> >>> >>> > >>> >> > Regards, >> >>> >>> > >>> >> > Behroz Sikander >> >>> >>> > >>> >> > >> >>> >>> > >>> >> > On Thu, Jun 25, 2015 at 1:13 PM, Behroz Sikander < >> >>> >>> > >>> behroz89@gmail.com> >> >>> >>> > >>> >> wrote: >> >>> >>> > >>> >> > >> >>> >>> > >>> >> >> Thank you :) >> >>> >>> > >>> >> >> >> >>> >>> > >>> >> >> On Thu, Jun 25, 2015 at 12:14 AM, Edward J. Yoon < >> >>> >>> > >>> edwardyoon@apache.org >> >>> >>> > >>> >> > >> >>> >>> > >>> >> >> wrote: >> >>> >>> > >>> >> >> >> >>> >>> > >>> >> >>> Hi, >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >>> You can get the maximum number of available tasks >> like >> >>> >>> following >> >>> >>> > >>> code: >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >>> BSPJobClient jobClient = new BSPJobClient(conf); >> >>> >>> > >>> >> >>> ClusterStatus cluster = >> >>> jobClient.getClusterStatus(true); >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >>> // Set to maximum >> >>> >>> > >>> >> >>> bsp.setNumBspTask(cluster.getMaxTasks()); >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >>> On Wed, Jun 24, 2015 at 11:20 PM, Behroz Sikander < >> >>> >>> > >>> behroz89@gmail.com> >> >>> >>> > >>> >> >>> wrote: >> >>> >>> > >>> >> >>> > Hi, >> >>> >>> > >>> >> >>> > 1) Thank you for this. >> >>> >>> > >>> >> >>> > 2) Here are the images. I will look into the log >> files >> >>> of PI >> >>> >>> > >>> example >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > *Result of JPS command on slave* >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >> >>> >>> > >>> >> >>> >>> > >> >>> >>> >> >>> >> http://s17.postimg.org/gpwe2bbfj/Screen_Shot_2015_06_22_at_7_23_31_PM.png >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > *Result of JPS command on Master* >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >> >>> >>> > >>> >> >>> >>> > >> >>> >>> >> >>> >> http://s14.postimg.org/s9922em5p/Screen_Shot_2015_06_22_at_7_23_42_PM.png >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > 3) In my current case, I do not have any input >> >>> submitted to >> >>> >>> > the >> >>> >>> > >>> job. >> >>> >>> > >>> >> >>> During >> >>> >>> > >>> >> >>> > run time, I directly fetch data from HDFS. So, I am >> >>> looking >> >>> >>> > for >> >>> >>> > >>> >> >>> something >> >>> >>> > >>> >> >>> > like BSPJob.set*Max*NumBspTask(). >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > Regards, >> >>> >>> > >>> >> >>> > Behroz >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > On Tue, Jun 23, 2015 at 12:57 AM, Edward J. Yoon < >> >>> >>> > >>> >> edwardyoon@apache.org >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> > wrote: >> >>> >>> > >>> >> >>> > >> >>> >>> > >>> >> >>> >> Hello, >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> 1) You can get the filesystem URI from a >> configuration >> >>> >>> using >> >>> >>> > >>> >> >>> >> "FileSystem fs = FileSystem.get(conf);". Of >> course, >> >>> the >> >>> >>> > >>> fs.defaultFS >> >>> >>> > >>> >> >>> >> property should be in hama-site.xml >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> fs.defaultFS >> >>> >>> > >>> >> >>> >> hdfs://host1.mydomain.com:9000/ >> >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> The name of the default file system. Either >> the >> >>> >>> literal >> >>> >>> > >>> string >> >>> >>> > >>> >> >>> >> "local" or a host:port for HDFS. >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> 2) The 'bsp.tasks.maximum' is the number of tasks >> per >> >>> node. >> >>> >>> > It >> >>> >>> > >>> looks >> >>> >>> > >>> >> >>> >> cluster configuration issue. Please run Pi example >> >>> and look >> >>> >>> > at >> >>> >>> > >>> the >> >>> >>> > >>> >> >>> >> logs for more details. NOTE: you can not attach >> the >> >>> images >> >>> >>> to >> >>> >>> > >>> >> mailing >> >>> >>> > >>> >> >>> >> list so I can't see it. >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> 3) You can use the BSPJob.setNumBspTask(int) >> method. >> >>> If >> >>> >>> input >> >>> >>> > >>> is >> >>> >>> > >>> >> >>> >> provided, the number of BSP tasks is basically >> driven >> >>> by >> >>> >>> the >> >>> >>> > >>> number >> >>> >>> > >>> >> of >> >>> >>> > >>> >> >>> >> DFS blocks. I'll fix it to be more flexible on >> >>> HAMA-956. >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> Thanks! >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> On Tue, Jun 23, 2015 at 2:33 AM, Behroz Sikander < >> >>> >>> > >>> >> behroz89@gmail.com> >> >>> >>> > >>> >> >>> >> wrote: >> >>> >>> > >>> >> >>> >> > Hi, >> >>> >>> > >>> >> >>> >> > Recently, I moved from a single machine setup >> to a 2 >> >>> >>> > machine >> >>> >>> > >>> >> setup. >> >>> >>> > >>> >> >>> I was >> >>> >>> > >>> >> >>> >> > successfully able to run my job that uses the >> HDFS >> >>> to get >> >>> >>> > >>> data. I >> >>> >>> > >>> >> >>> have 3 >> >>> >>> > >>> >> >>> >> > trivial questions >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > 1- To access HDFS, I have to manually give the >> IP >> >>> address >> >>> >>> > of >> >>> >>> > >>> >> server >> >>> >>> > >>> >> >>> >> running >> >>> >>> > >>> >> >>> >> > HDFS. I thought that Hama will automatically >> pick >> >>> from >> >>> >>> the >> >>> >>> > >>> >> >>> configurations >> >>> >>> > >>> >> >>> >> > but it does not. I am probably doing something >> >>> wrong. >> >>> >>> Right >> >>> >>> > >>> now my >> >>> >>> > >>> >> >>> code >> >>> >>> > >>> >> >>> >> work >> >>> >>> > >>> >> >>> >> > by using the following. >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > FileSystem fs = FileSystem.get(new >> >>> >>> > >>> URI("hdfs://server_ip:port/"), >> >>> >>> > >>> >> >>> conf); >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > 2- On my master server, when I start hama it >> >>> >>> automatically >> >>> >>> > >>> starts >> >>> >>> > >>> >> >>> hama in >> >>> >>> > >>> >> >>> >> > the slave machine (all good). Both master and >> slave >> >>> are >> >>> >>> set >> >>> >>> > >>> as >> >>> >>> > >>> >> >>> >> groomservers. >> >>> >>> > >>> >> >>> >> > This means that I have 2 servers to run my job >> which >> >>> >>> means >> >>> >>> > >>> that I >> >>> >>> > >>> >> can >> >>> >>> > >>> >> >>> >> open >> >>> >>> > >>> >> >>> >> > more BSPPeerChild processes. And if I submit my >> jar >> >>> with >> >>> >>> 3 >> >>> >>> > >>> bsp >> >>> >>> > >>> >> tasks >> >>> >>> > >>> >> >>> then >> >>> >>> > >>> >> >>> >> > everything works fine. But when I move to 4 >> tasks, >> >>> Hama >> >>> >>> > >>> freezes. >> >>> >>> > >>> >> >>> Here is >> >>> >>> > >>> >> >>> >> the >> >>> >>> > >>> >> >>> >> > result of JPS command on slave. >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > Result of JPS command on Master >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > You can see that it is only opening tasks on >> slaves >> >>> but >> >>> >>> not >> >>> >>> > >>> on >> >>> >>> > >>> >> >>> master. >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > Note: I tried to change the bsp.tasks.maximum >> >>> property in >> >>> >>> > >>> >> >>> >> hama-default.xml >> >>> >>> > >>> >> >>> >> > to 4 but still same result. >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > 3- I want my cluster to open as many >> BSPPeerChild >> >>> >>> processes >> >>> >>> > >>> as >> >>> >>> > >>> >> >>> possible. >> >>> >>> > >>> >> >>> >> Is >> >>> >>> > >>> >> >>> >> > there any setting that can I do to achieve that >> ? >> >>> Or hama >> >>> >>> > >>> picks up >> >>> >>> > >>> >> >>> the >> >>> >>> > >>> >> >>> >> > values from hama-default.xml to open tasks ? >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > Regards, >> >>> >>> > >>> >> >>> >> > >> >>> >>> > >>> >> >>> >> > Behroz Sikander >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> -- >> >>> >>> > >>> >> >>> >> Best Regards, Edward J. Yoon >> >>> >>> > >>> >> >>> >> >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >>> -- >> >>> >>> > >>> >> >>> Best Regards, Edward J. Yoon >> >>> >>> > >>> >> >>> >> >>> >>> > >>> >> >> >> >>> >>> > >>> >> >> >> >>> >>> > >>> >> >> >>> >>> > >>> >> >> >>> >>> > >>> >> >> >>> >>> > >>> >> -- >> >>> >>> > >>> >> Best Regards, Edward J. Yoon >> >>> >>> > >>> >> >> >>> >>> > >>> >> >>> >>> > >>> >> >>> >>> > >>> >> >>> >>> > >>> -- >> >>> >>> > >>> Best Regards, Edward J. Yoon >> >>> >>> > >>> >> >>> >>> > >> >> >>> >>> > >> >> >>> >>> > > >> >>> >>> > >> >>> >>> > >> >>> >>> > >> >>> >>> >> >>> >>> >> >>> >>> >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > Best Regards, Edward J. Yoon >> >>> >> >>> >> >>> >> >>> -- >> >>> Best Regards, Edward J. Yoon >> >>> >> >> >> >> >> >> >> >> -- >> Best Regards, Edward J. Yoon >> > >