trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radu Marias <radumar...@gmail.com>
Subject Re: odbc and/or hammerdb logs
Date Fri, 18 Sep 2015 11:21:44 GMT
$ $JAVA_HOME/bin/java -XX:+PrintFlagsFinal -version | grep HeapSize
    uintx ErgoHeapSizeLimit                         = 0
{product}
    uintx HeapSizePerGCThread                       = 87241520
 {product}
    uintx InitialHeapSize                          := 402653184
{product}
    uintx LargePageHeapSizeThreshold                = 134217728
{product}
    uintx MaxHeapSize                              := 6442450944
 {product}
java version "1.7.0_67"
Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)


On Fri, Sep 18, 2015 at 2:20 PM, Radu Marias <radumarias@gmail.com> wrote:

> I've logged this issue several days ago, is this ok?
> https://issues.apache.org/jira/browse/TRAFODION-1492
>
> Will try with one user and let you know.
>
> On Fri, Sep 18, 2015 at 7:05 AM, Suresh Subbiah <
> suresh.subbiah60@gmail.com> wrote:
>
>> Hi
>>
>> How many Virtual users are being used? If it is more than one could we
>> please try the case with 1 user first.
>>
>> When the crash happens next time could we please try
>> sqps | grep esp | wc -l
>>
>> If this number is large we know a lot of esp processes are being started
>> which could consume memory.
>> If this is the case please insert this row into the defaults table from
>> sqlci and the restart dcs (dcsstop followed by dcsstart)
>> insert into "_MD_".defaults values('ATTEMPT_ESP_PARALLELISM', 'OFF',
>> 'hammerdb testing') ;
>> exit ;
>>
>> I will work having the udr process create a JVM with a smaller initial
>> heap
>> size. If you have time and would like to do so a, JIRA you file will be
>> helpful. Or I can file the JIRA and work on it. It will not take long to
>> make this change.
>>
>> Thanks
>> Suresh
>>
>> PS I found this command from stackOverflow to determine the
>> initialHeapSize
>> we get by default in this env
>>
>> java -XX:+PrintFlagsFinal -version | grep HeapSize
>>
>>
>>
>>
>> http://stackoverflow.com/questions/4667483/how-is-the-default-java-heap-size-determined
>>
>>
>>
>> On Thu, Sep 17, 2015 at 10:32 AM, Radu Marias <radumarias@gmail.com>
>> wrote:
>>
>> > Did the steps mentioned above to ensure that the trafodion processes are
>> > free of JAVA installation mixup.
>> > Also changed so that hdp, trafodion and hammerdb uses the same jdk from
>> > */usr/jdk64/jdk1.7.0_67*
>> >
>> > # java -version
>> > java version "1.7.0_67"
>> > Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
>> > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
>> >
>> > # echo $JAVA_HOME
>> > /usr/jdk64/jdk1.7.0_67
>> >
>> > But when running hammerdb I got again a crash on 2 nodes. I noticed that
>> > before the crash for about one minute I'm getting errors for *java
>> > -version* and
>> > about 30 seconds after the crash the java -version worked again. So
>> these
>> > issues might be related. Didn't yet found the problem and how to fix the
>> > java -version issue.
>> >
>> > # java -version
>> > Error occurred during initialization of VM
>> > Could not reserve enough space for object heap
>> > Error: Could not create the Java Virtual Machine.
>> > Error: A fatal exception has occurred. Program will exit
>> >
>> > # file core.5813
>> > core.5813: ELF 64-bit LSB core file x86-64, version 1 (SYSV),
>> SVR4-style,
>> > from 'tdm_udrserv SQMON1.1 00000 00000 005813 $Z0004R3
>> > 188.138.61.175:48357
>> > 00004 000'
>> >
>> > #0  0x00007f6920ba0625 in raise () from /lib64/libc.so.6
>> > #1  0x00007f6920ba1e05 in abort () from /lib64/libc.so.6
>> > #2  0x0000000000424369 in comTFDS (msg1=0x43c070 "Trafodion UDR Server
>> > Internal Error", msg2=<value optimized out>, msg3=0x7fff119787f0 "Source
>> > file information unavailable",
>> >     msg4=0x7fff11977ff0 "User routine being processed :
>> > TRAFODION.TPCC.NEWORDER, Routine Type : Stored Procedure, Language Type
>> :
>> > JAVA, Error occurred outside the user routine code", msg5=0x43ddc3 "",
>> > dialOut=<value optimized out>, writeToSeaLog=1) at
>> > ../udrserv/UdrFFDC.cpp:191
>> > #3  0x00000000004245d7 in makeTFDSCall (msg=0x7f692324b310 "The Java
>> > virtual machine aborted", file=<value optimized out>, line=<value
>> optimized
>> > out>, dialOut=1, writeToSeaLog=1) at ../udrserv/UdrFFDC.cpp:219
>> > #4  0x00007f69232316b8 in LmJavaHooks::abortHookJVM () at
>> > ../langman/LmJavaHooks.cpp:54
>> > #5  0x00007f69229cbbc6 in ParallelScavengeHeap::initialize() () from
>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
>> > #6  0x00007f6922afedba in Universe::initialize_heap() () from
>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
>> > #7  0x00007f6922afff89 in universe_init() () from
>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
>> > #8  0x00007f692273d9f5 in init_globals() () from
>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
>> > #9  0x00007f6922ae78ed in Threads::create_vm(JavaVMInitArgs*, bool*) ()
>> > from /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
>> > #10 0x00007f69227c5a34 in JNI_CreateJavaVM () from
>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
>> > #11 0x00007f692322de51 in LmLanguageManagerJava::initialize (this=<value
>> > optimized out>, result=<value optimized out>, maxLMJava=<value optimized
>> > out>, userOptions=0x7f69239ba418, diagsArea=<value optimized out>)
at
>> > ../langman/LmLangManagerJava.cpp:379
>> > #12 0x00007f692322f564 in LmLanguageManagerJava::LmLanguageManagerJava
>> > (this=0x7f69239bec38, result=@0x7fff1197e19c, commandLineMode=<value
>> > optimized out>, maxLMJava=1, userOptions=0x7f69239ba418,
>> > diagsArea=0x7f6923991780) at ../langman/LmLangManagerJava.cpp:155
>> > #13 0x0000000000425619 in UdrGlobals::getOrCreateJavaLM
>> > (this=0x7f69239ba040, result=@0x7fff1197e19c, diags=<value optimized
>> out>)
>> > at ../udrserv/udrglobals.cpp:322
>> > #14 0x0000000000427328 in processALoadMessage (UdrGlob=0x7f69239ba040,
>> > msgStream=..., request=..., env=<value optimized out>) at
>> > ../udrserv/udrload.cpp:163
>> > #15 0x000000000042fbfd in processARequest (UdrGlob=0x7f69239ba040,
>> > msgStream=..., env=...) at ../udrserv/udrserv.cpp:660
>> > #16 0x000000000043269c in runServer (argc=2, argv=0x7fff1197e528) at
>> > ../udrserv/udrserv.cpp:520
>> > #17 0x000000000043294e in main (argc=2, argv=0x7fff1197e528) at
>> > ../udrserv/udrserv.cpp:356
>> >
>> > On Wed, Sep 16, 2015 at 6:03 PM, Suresh Subbiah <
>> > suresh.subbiah60@gmail.com>
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > I have added a wiki page that describes how to get a stack trace from
>> a
>> > > core file. The page could do with some improvements on finding the
>> core
>> > > file and maybe even doing more than getting thestack trace. For now it
>> > > should make our troubleshooting cycle faster if the stack trace is
>> > included
>> > > in the initial message itself.
>> > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/TRAFODION/Obtain+stack+trace+from+a+core+file
>> > >
>> > > In this case, the last node does not seem to have gdb, so I could not
>> see
>> > > the trace there. I moved the core file to the first node but then the
>> > trace
>> > > looks like this. I assume this is because I moved the core file to a
>> > > different node. I think Selva's suggestion is good to try. We may have
>> > had
>> > > a few  tdm_udrserv processes from before the time the java change was
>> > made.
>> > >
>> > > $ gdb tdm_udrserv core.49256
>> > > #0  0x00007fe187a674fe in __longjmp () from /lib64/libc.so.6
>> > > #1  0x8857780a58ff2155 in ?? ()
>> > > Cannot access memory at address 0x8857780a58ff2155
>> > >
>> > > The back trace we saw yesterday when a udrserv process exited when JVM
>> > > could not be started is used in the wiki page instead of this one. If
>> you
>> > > have time a JIRA on this unexpected udrserv exit will also be valuable
>> > for
>> > > the Trafodion team.
>> > >
>> > > Thanks
>> > > Suresh
>> > >
>> > > On Wed, Sep 16, 2015 at 8:39 AM, Selva Govindarajan <
>> > > selva.govindarajan@esgyn.com> wrote:
>> > >
>> > > > Thanks for creating the JIRA Trafodion-1492.  The error is similar
>> to
>> > > > scenario-2. The process tdm_udrserv dumped core. We will look into
>> the
>> > > core
>> > > > file. In the meantime, can you please do the following:
>> > > >
>> > > > Bring the Trafodion instance down
>> > > > echo $MY_SQROOT -- shows Trafodion installation directory
>> > > > Remove $MY_SQROOT/etc/ms.env from all nodes
>> > > >
>> > > >
>> > > > Start a New Terminal Session so that new Java settings are in place
>> > > > Login as a Trafodion user
>> > > > cd <trafodion_installation_directory>
>> > > > . ./sqenv.sh  (skip this if it is done automatically upon logon)
>> > > > sqgen
>> > > >
>> > > > Exit and Start a New Terminal Session
>> > > > Restart the Trafodion instance and check if you are seeing the issue
>> > with
>> > > > tdm_udrserv again. We wanted to ensure that the trafodion processes
>> are
>> > > > free
>> > > > of JAVA installation mixup in your earlier message. We suspect that
>> can
>> > > > cause tdm_udrserv process  to dump core.
>> > > >
>> > > >
>> > > > Selva
>> > > >
>> > > > -----Original Message-----
>> > > > From: Radu Marias [mailto:radumarias@gmail.com]
>> > > > Sent: Wednesday, September 16, 2015 5:40 AM
>> > > > To: dev <dev@trafodion.incubator.apache.org>
>> > > > Subject: Re: odbc and/or hammerdb logs
>> > > >
>> > > > I'm seeing this in hammerdb logs, I assume is due to the crash and
>> some
>> > > > processes are stopped:
>> > > >
>> > > > Error in Virtual User 1: [Trafodion ODBC Driver][Trafodion Database]
>> > SQL
>> > > > ERROR:*** ERROR[2034] $Z0106BZ:16: Operating system error 201 while
>> > > > communicating with server process $Z010LPE:23. [2015-09-16 12:35:33]
>> > > > [Trafodion ODBC Driver][Trafodion Database] SQL ERROR:***
>> ERROR[8904]
>> > SQL
>> > > > did not receive a reply from MXUDR, possibly caused by internal
>> errors
>> > > when
>> > > > executing user-defined routines. [2015-09-16 12:35:33]
>> > > >
>> > > > $ sqcheck
>> > > > Checking if processes are up.
>> > > > Checking attempt: 1; user specified max: 2. Execution time in
>> seconds:
>> > 0.
>> > > >
>> > > > The SQ environment is up!
>> > > >
>> > > >
>> > > > Process         Configured      Actual      Down
>> > > > -------         ----------      ------      ----
>> > > > DTM             5               5
>> > > > RMS             10              10
>> > > > MXOSRVR         20              20
>> > > >
>> > > > On Wed, Sep 16, 2015 at 3:28 PM, Radu Marias <radumarias@gmail.com>
>> > > wrote:
>> > > >
>> > > > > I've restarted hdp and trafodion and now I managed to create
the
>> > > > > schema and stored procedures from hammerdb. But I'm getting fails
>> and
>> > > > > dump core again by trafodion while running virtual users. For
>> some of
>> > > > > the users I sometimes see in hammerdb logs:
>> > > > > Vuser 5:Failed to execute payment
>> > > > > Vuser 5:Failed to execute stock level
>> > > > > Vuser 5:Failed to execute new order
>> > > > >
>> > > > > Core files are on out last node, feel free to examine them, the
>> files
>> > > > > were dumped while getting hammerdb errors:
>> > > > >
>> > > > > *core.49256*
>> > > > >
>> > > > > *core.48633*
>> > > > >
>> > > > > *core.49290*
>> > > > >
>> > > > >
>> > > > > On Wed, Sep 16, 2015 at 3:24 PM, Radu Marias <
>> radumarias@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > >> *Scenario 1:*
>> > > > >>
>> > > > >> I've created this issue
>> > > > >> https://issues.apache.org/jira/browse/TRAFODION-1492
>> > > > >> I think another fix was made related to *Committed_AS* in
>> > > > >> *sql/cli/memmonitor.cpp*.
>> > > > >>
>> > > > >> This is a response from Narendra in a previous thread where
the
>> > issue
>> > > > >> was fixed to start the trafodion:
>> > > > >>
>> > > > >>
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >>> *I updated the code: sql/cli/memmonitor.cpp, so that
if
>> > > > >>> /proc/meminfo does not have the ‘Committed_AS’ entry,
it will
>> > ignore
>> > > > >>> it. Built it and put the binary: libcli.so on the veracity
box
>> (in
>> > > > >>> the $MY_SQROOT/export/lib64 directory – on all the
nodes).
>> > Restarted
>> > > > the
>> > > > >>> env and ‘sqlci’ worked fine.
>> > > > >>> Was able to ‘initialize trafodion’ and create a table.*
>> > > > >>
>> > > > >>
>> > > > >> *Scenario 2:*
>> > > > >>
>> > > > >> The *java -version* problem I recall we had only on the other
>> > cluster
>> > > > >> with centos 7, I did't seen it on this one with centos 6.7.
But a
>> > > > >> change I made these days in the latter one is installing
oracle
>> *jdk
>> > > > >> 1.7.0_79* as default one and is where *JAVA_HOME* points
to.
>> Before
>> > > > >> that some nodes had *open-jdk* as default and others didn't
have
>> one
>> > > > >> but just the one installed by path by *ambari* in
>> > > > >> */usr/jdk64/jdk1.7.0_67* but which was not linked to JAVA_HOME
or
>> > > *java*
>> > > > >> command by *alternatives*.
>> > > > >>
>> > > > >> *Failures is HammerDB:*
>> > > > >>
>> > > > >> Attached is the *trafodion.dtm.**log* from a node on which
I see
>> a
>> > > > >> lot of lines like these and I assume is the *transaction
>> conflict*
>> > > > >> that you mentioned, I see these line on 4 out of 5 nodes:
>> > > > >>
>> > > > >> 2015-09-14 12:21:49,413 INFO dtm.HBaseTxClient: useForgotten
is
>> true
>> > > > >> 2015-09-14 12:21:49,414 INFO dtm.HBaseTxClient: forceForgotten
is
>> > > > >> false
>> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: forceControlPoint
>> is
>> > > > >> false
>> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: useAutoFlush
is
>> false
>> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: ageCommitted
is
>> false
>> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: disableBlockCache
>> is
>> > > > >> false
>> > > > >> 2015-09-14 12:21:52,229 INFO dtm.HBaseAuditControlPoint:
>> > > > >> disableBlockCache is false
>> > > > >> 2015-09-14 12:21:52,233 INFO dtm.HBaseAuditControlPoint:
>> > useAutoFlush
>> > > > >> is false
>> > > > >> 2015-09-14 12:42:57,346 INFO dtm.HBaseTxClient: Exit
>> RET_HASCONFLICT
>> > > > >> prepareCommit, txid: 17179989222
>> > > > >> 2015-09-14 12:43:46,102 INFO dtm.HBaseTxClient: Exit
>> RET_HASCONFLICT
>> > > > >> prepareCommit, txid: 17179989277
>> > > > >> 2015-09-14 12:44:11,598 INFO dtm.HBaseTxClient: Exit
>> RET_HASCONFLICT
>> > > > >> prepareCommit, txid: 17179989309
>> > > > >>
>> > > > >> What *transaction conflict* means in this case?
>> > > > >>
>> > > > >> On Wed, Sep 16, 2015 at 2:43 AM, Selva Govindarajan <
>> > > > >> selva.govindarajan@esgyn.com> wrote:
>> > > > >>
>> > > > >>> Hi Radu,
>> > > > >>>
>> > > > >>> Thanks for using Trafodion. With the help from Suresh,
we
>> looked at
>> > > > >>> the core files in your cluster. We believe that there
are two
>> > > > >>> scenarios that is causing the Trafodion processes to
dump core.
>> > > > >>>
>> > > > >>> Scenario 1:
>> > > > >>> Core dumped by tdm_arkesp processes. Trafodion engine
has
>> assumed
>> > > > >>> the entity /proc/meminfo/Committed_AS is available in
all
>> flavors
>> > of
>> > > > >>> linux.  The absence of this entity is not handled correctly
by
>> the
>> > > > >>> trafodion tdm_arkesp process and hence it dumped core.
Please
>> file
>> > a
>> > > > >>> JIRA using this link
>> > > > >>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa
>> and
>> > > > >>> choose "Apache Trafodion" as the project to report a
bug
>> against.
>> > > > >>>
>> > > > >>> Scenario 2:
>> > > > >>> Core dumped by tdm_udrserv processes. From our analysis,
this
>> > > > >>> problem happened when the process attempted to create
the JVM
>> > > > >>> instance programmatically. Few days earlier, we have
observed
>> > > > >>> similar issue in your cluster when java -version command
was
>> > > > >>> attempted. But, java -version or $JAVA_HOME/bin/java
-version
>> works
>> > > > >>> fine now.
>> > > > >>> Was there any change made to the cluster recently to
avoid the
>> > > > >>> problem with java -version command?
>> > > > >>>
>> > > > >>> You can please delete all the core files in sql/scripts
>> directory
>> > > > >>> and issue the command to invoke SPJ and check if it still
dumps
>> > > > >>> core. We can look at the core file if it happens again.
Your
>> > > > >>> solution to the java -version command would be helpful.
>> > > > >>>
>> > > > >>> For the failures with HammerDB, can you please send us
the exact
>> > > > >>> error message returned by the Trafodion engine to the
>> application.
>> > > > >>> This might help us to narrow down the cause. You can
also look
>> at
>> > > > >>> $MY_SQROOT/logs/trafodion.dtm.log to check if any transaction
>> > > > >>> conflict is causing this error.
>> > > > >>>
>> > > > >>> Selva
>> > > > >>> -----Original Message-----
>> > > > >>> From: Radu Marias [mailto:radumarias@gmail.com]
>> > > > >>> Sent: Tuesday, September 15, 2015 9:09 AM
>> > > > >>> To: dev <dev@trafodion.incubator.apache.org>
>> > > > >>> Subject: Re: odbc and/or hammerdb logs
>> > > > >>>
>> > > > >>> Also noticed there are several core. files from today
in
>> > > > >>> */home/trafodion/trafodion-20150828_0830/sql/scripts*.
If needed
>> > > > >>> please provide a gmail address so I can share them via
gdrive.
>> > > > >>>
>> > > > >>> On Tue, Sep 15, 2015 at 6:29 PM, Radu Marias <
>> radumarias@gmail.com
>> > >
>> > > > >>> wrote:
>> > > > >>>
>> > > > >>> > Hi,
>> > > > >>> >
>> > > > >>> > I'm running HammerDB over trafodion and when running
virtual
>> > users
>> > > > >>> > sometimes I get errors like this in hammerdb logs:
>> > > > >>> > *Vuser 1:Failed to execute payment*
>> > > > >>> >
>> > > > >>> > *Vuser 1:Failed to execute new order*
>> > > > >>> >
>> > > > >>> > I'm using unixODBC and I tried to add these line
in
>> > > > >>> > */etc/odbc.ini* but the trace file is not created.
>> > > > >>> > *[ODBC]*
>> > > > >>> > *Trace = 1*
>> > > > >>> > *TraceFile = /var/log/odbc_tracefile.log*
>> > > > >>> >
>> > > > >>> > Also tried with *Trace = yes* and *Trace = on*,
I've found
>> > > > >>> > multiple references for both.
>> > > > >>> >
>> > > > >>> > How can I see more logs to debug the issue? Can
I enable logs
>> for
>> > > > >>> > all queries in trafodion?
>> > > > >>> >
>> > > > >>> > --
>> > > > >>> > And in the end, it's not the years in your life
that count.
>> It's
>> > > > >>> > the life in your years.
>> > > > >>> >
>> > > > >>>
>> > > > >>>
>> > > > >>>
>> > > > >>> --
>> > > > >>> And in the end, it's not the years in your life that
count. It's
>> > the
>> > > > >>> life in your years.
>> > > > >>>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >> --
>> > > > >> And in the end, it's not the years in your life that count.
It's
>> the
>> > > > life
>> > > > >> in your years.
>> > > > >>
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > And in the end, it's not the years in your life that count. It's
>> the
>> > > life
>> > > > > in your years.
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > And in the end, it's not the years in your life that count. It's the
>> > life
>> > > > in your years.
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > And in the end, it's not the years in your life that count. It's the
>> life
>> > in your years.
>> >
>>
>
>
>
> --
> And in the end, it's not the years in your life that count. It's the life
> in your years.
>



-- 
And in the end, it's not the years in your life that count. It's the life
in your years.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message