trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selva Govindarajan <selva.govindara...@esgyn.com>
Subject RE: [Urgent Help] Trafodion Build Environment Problem
Date Tue, 08 Sep 2015 18:02:16 GMT
Hi Amanda,

I presume that the installer will flag this as a requirement for Trafodion
to be installed. Will it abort the installation or will the installer fix
the pid_max settings automatically.

Selva

-----Original Message-----
From: Amanda Moran [mailto:amanda.moran@esgyn.com]
Sent: Tuesday, September 8, 2015 9:20 AM
To: dev@trafodion.incubator.apache.org
Cc: Lijian (Q) <jianli.li@huawei.com>
Subject: Re: [Urgent Help] Trafodion Build Environment Problem

Hi there-

This is fixed in latest version of installer.

Thanks.

Sent from my iPhone

> On Sep 8, 2015, at 9:07 AM, Dave Birdsall <dave.birdsall@esgyn.com>
wrote:
>
> Hi,
>
> I'm wondering if this should be reported as a problem? Perhaps
> Nieyuanyuan would like to open a JIRA about supporting higher PID
numbers in Trafodion?
>
> Dave
>
> -----Original Message-----
> From: Narendra Goyal [mailto:narendra.goyal@esgyn.com]
> Sent: Monday, September 7, 2015 7:04 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <jianli.li@huawei.com>
> Subject: RE: [Urgent Help] Trafodion Build Environment Problem
>
> Hi Nieyuanyuan,
>
> Could you please check the 'pid_max' settings:
> sysctl -q kernel.pid_max
> (or cat /proc/sys/kernel/pid_max)
>
> If the value is > 64K, I would recommend you set it to 64K, like so:
> sudo sysctl -w kernel.pid_max=65535
>
> You will  have to restart Tradfodion and other Hadoop/HBase processes:
> swstopall
> ckillall
> swstartall
> sqstart
>
> Just fyi, to check the list of Trafodion processes only, please run
'cstat'
> on your bash.
>
> Thanks,
> -Narendra
>
>
> -----Original Message-----
> From: Nieyuanyuan [mailto:nieyuanyuan@huawei.com]
> Sent: Monday, September 7, 2015 6:40 PM
> To: dev@trafodion.incubator.apache.org
> Cc: Lijian (Q) <jianli.li@huawei.com>
> Subject: [Urgent Help] Trafodion Build Environment Problem
>
> Dear Guys,
>
> I recently downloaded trafodion 1.1 from
> https://github.com/apache/incubator-trafodion/tree/stable/1.1, and
> followed the build guide from
> https://wiki.trafodion.org/wiki/index.php/Building_the_Software, and
> solved a lot of problems (no need to list all details), I am able to
> run trafodion over a hadoop sandbox environment.
>
> But I got a serious problem, that is, all Trafodion related process
> will go down after several minutes (not sure how long), only few of
> them will
> left:
> [nieyy@redhat-72 ~]$ ps ux
> USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
COMMAND
> nieyy     76554  0.1  0.1 590988 139768 pts/6   Sl   19:14   0:04
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    118833  0.7  0.3 1535452 420996 ?      Sl   19:40   0:12
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_namenode -Xmx1000m
> -Djava.net.prefe
> nieyy    119085  0.6  0.2 1572688 367388 ?      Sl   19:40   0:10
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_datanode -Xmx1000m
> -Djava.net.prefe
> nieyy    119320  0.4  0.2 1512656 340636 ?      Sl   19:41   0:07
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_secondarynamenode -Xmx1000m -Djava.
> nieyy    119972  1.2  0.2 1708408 378536 pts/6  Sl   19:41   0:20
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_resourcemanager -Xmx1000m -Dhadoop.
> nieyy    120133  0.9  0.2 1616388 309976 ?      Sl   19:41   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -Dproc_nodemanager -Xmx1000m -Dhadoop.log.
> nieyy    120371  0.0  0.0   9824  1772 pts/6    S    19:41   0:00
/bin/sh
> ./bin/mysqld_safe
>
--defaults-file=/home/nieyy/trafodion_build/incubator-trafodion-stable-1.
> nieyy    120594  0.0  0.0 452604 89908 pts/6    Sl   19:41   0:01
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> cal_hadoop/mysql/bin/mysq
> nieyy    120789  0.0  0.0   9692  1736 pts/6    S    19:41   0:00 bash
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sq
> l/lo
> cal_hadoop/hbase/bin
> nieyy    120806  2.0  0.3 1809048 509164 pts/6  Sl   19:41   0:34
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java -Dproc_master
> -XX:OnOutOfMemoryError=kill
> nieyy    122554  0.0  0.0  13624  1304 pts/6    S    19:41   0:00 mpirun
> -disable-auto-cleanup -demux select -env SQ_IC TCP -env
> MPI_ERROR_LEVEL
> 2 -env SQ_PIDMAP 1 -
> nieyy    122555  0.0  0.0      0     0 ?        Zs   19:41   0:00
> [hydra_pmi_proxy] <defunct>
> nieyy    122556  1.0  0.0 335212 36748 ?        Ssl  19:41   0:17
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> /bin64d/monitor COLD
> nieyy    122557  0.8  0.0 335212 36768 ?        Ssl  19:41   0:14
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sqf/ex
> port
> /bin64d/monitor COLD
> nieyy    123946  0.9  0.1 828072 223088 pts/6   Sl   19:42   0:14
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
> nieyy    124044  1.0  0.1 629200 187180 pts/6   Sl   19:42   0:16
> /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.45.x86_64/bin/java
> -XX:OnOutOfMemoryError=kill -9 %p -Xmx128m
>
> And then I need to kill all processes and use swstartall and sqstart
> to reset the environment, however, the environment will still go down
> after a while, and I need to restart again.
>
> I found some cores under
> trafodion_build/incubator-trafodion-stable-1.1/core/sqf/sql/scripts,
> all cored were generated by mxssmp:
> [nieyy@redhat-72 scripts]$ ll core*
> ...
> -rw------- 1 nieyy nieyy 156008448 Sep  7 17:56 core.mxssmp.173357
> -rw------- 1 nieyy nieyy 145518592 Sep  7 17:56 core.mxssmp.173372
> -rw------- 1 nieyy nieyy 156008448 Sep  7 19:24 core.mxssmp.74146
> -rw------- 1 nieyy nieyy 145518592 Sep  7 19:24 core.mxssmp.74197
>
> I used gdb to track the stack:
> [nieyy@redhat-72 scripts]$ gdb
> /home/nieyy/trafodion_build/incubator-trafodion-stable-1.1/core/sql/li
> b/li nux/64bit/debug/mxssmp ./core.mxssmp.141469 ...
> (gdb) where
> #0  0x000000000044166c in ProcessStats::getHeap (this=0x2000) at
> ../runtimestats/SqlStats.h:271
> #1  0x000000000043990a in StatsGlobals::removeProcess
> (this=0x10000000, pid=65536, calledAtAdd=0) at
> ../runtimestats/SqlStats.cpp:276
> #2  0x0000000000439e05 in StatsGlobals::checkForDeadProcesses
> (this=0x10000000, myPid=141469) at ../runtimestats/SqlStats.cpp:382
> #3  0x00000000004440be in SsmpGlobals::work (this=0x7f062660c7e8) at
> ../runtimestats/ssmpipc.cpp:582
> #4  0x000000000042f06a in runServer (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:259
> #5  0x000000000042eb12 in main (argc=1, argv=0x7fff5b0e5a48) at
> ../bin/ex_ssmp_main.cpp:127
>
> Then I searched via Google, and found a link
> https://bugs.launchpad.net/trafodion/+bug/1368891 which looks similar,
> but it claimed the bug has been fixed at v0.9, but my version is 1.1.
>
> So, could you kindly help me to solve this problem cause I can't find
> more useful information via Google.
>
> Thanks a lot.

Mime
View raw message