trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Subbiah <suresh.subbia...@gmail.com>
Subject Re: odbc and/or hammerdb logs
Date Wed, 16 Sep 2015 15:03:33 GMT
Hi,

I have added a wiki page that describes how to get a stack trace from a
core file. The page could do with some improvements on finding the core
file and maybe even doing more than getting thestack trace. For now it
should make our troubleshooting cycle faster if the stack trace is included
in the initial message itself.
https://cwiki.apache.org/confluence/display/TRAFODION/Obtain+stack+trace+from+a+core+file

In this case, the last node does not seem to have gdb, so I could not see
the trace there. I moved the core file to the first node but then the trace
looks like this. I assume this is because I moved the core file to a
different node. I think Selva's suggestion is good to try. We may have had
a few  tdm_udrserv processes from before the time the java change was made.

$ gdb tdm_udrserv core.49256
#0  0x00007fe187a674fe in __longjmp () from /lib64/libc.so.6
#1  0x8857780a58ff2155 in ?? ()
Cannot access memory at address 0x8857780a58ff2155

The back trace we saw yesterday when a udrserv process exited when JVM
could not be started is used in the wiki page instead of this one. If you
have time a JIRA on this unexpected udrserv exit will also be valuable for
the Trafodion team.

Thanks
Suresh

On Wed, Sep 16, 2015 at 8:39 AM, Selva Govindarajan <
selva.govindarajan@esgyn.com> wrote:

> Thanks for creating the JIRA Trafodion-1492.  The error is similar to
> scenario-2. The process tdm_udrserv dumped core. We will look into the core
> file. In the meantime, can you please do the following:
>
> Bring the Trafodion instance down
> echo $MY_SQROOT -- shows Trafodion installation directory
> Remove $MY_SQROOT/etc/ms.env from all nodes
>
>
> Start a New Terminal Session so that new Java settings are in place
> Login as a Trafodion user
> cd <trafodion_installation_directory>
> . ./sqenv.sh  (skip this if it is done automatically upon logon)
> sqgen
>
> Exit and Start a New Terminal Session
> Restart the Trafodion instance and check if you are seeing the issue with
> tdm_udrserv again. We wanted to ensure that the trafodion processes are
> free
> of JAVA installation mixup in your earlier message. We suspect that can
> cause tdm_udrserv process  to dump core.
>
>
> Selva
>
> -----Original Message-----
> From: Radu Marias [mailto:radumarias@gmail.com]
> Sent: Wednesday, September 16, 2015 5:40 AM
> To: dev <dev@trafodion.incubator.apache.org>
> Subject: Re: odbc and/or hammerdb logs
>
> I'm seeing this in hammerdb logs, I assume is due to the crash and some
> processes are stopped:
>
> Error in Virtual User 1: [Trafodion ODBC Driver][Trafodion Database] SQL
> ERROR:*** ERROR[2034] $Z0106BZ:16: Operating system error 201 while
> communicating with server process $Z010LPE:23. [2015-09-16 12:35:33]
> [Trafodion ODBC Driver][Trafodion Database] SQL ERROR:*** ERROR[8904] SQL
> did not receive a reply from MXUDR, possibly caused by internal errors when
> executing user-defined routines. [2015-09-16 12:35:33]
>
> $ sqcheck
> Checking if processes are up.
> Checking attempt: 1; user specified max: 2. Execution time in seconds: 0.
>
> The SQ environment is up!
>
>
> Process         Configured      Actual      Down
> -------         ----------      ------      ----
> DTM             5               5
> RMS             10              10
> MXOSRVR         20              20
>
> On Wed, Sep 16, 2015 at 3:28 PM, Radu Marias <radumarias@gmail.com> wrote:
>
> > I've restarted hdp and trafodion and now I managed to create the
> > schema and stored procedures from hammerdb. But I'm getting fails and
> > dump core again by trafodion while running virtual users. For some of
> > the users I sometimes see in hammerdb logs:
> > Vuser 5:Failed to execute payment
> > Vuser 5:Failed to execute stock level
> > Vuser 5:Failed to execute new order
> >
> > Core files are on out last node, feel free to examine them, the files
> > were dumped while getting hammerdb errors:
> >
> > *core.49256*
> >
> > *core.48633*
> >
> > *core.49290*
> >
> >
> > On Wed, Sep 16, 2015 at 3:24 PM, Radu Marias <radumarias@gmail.com>
> wrote:
> >
> >> *Scenario 1:*
> >>
> >> I've created this issue
> >> https://issues.apache.org/jira/browse/TRAFODION-1492
> >> I think another fix was made related to *Committed_AS* in
> >> *sql/cli/memmonitor.cpp*.
> >>
> >> This is a response from Narendra in a previous thread where the issue
> >> was fixed to start the trafodion:
> >>
> >>
> >>>
> >>>
> >>>
> >>> *I updated the code: sql/cli/memmonitor.cpp, so that if
> >>> /proc/meminfo does not have the ‘Committed_AS’ entry, it will ignore
> >>> it. Built it and put the binary: libcli.so on the veracity box (in
> >>> the $MY_SQROOT/export/lib64 directory – on all the nodes). Restarted
> the
> >>> env and ‘sqlci’ worked fine.
> >>> Was able to ‘initialize trafodion’ and create a table.*
> >>
> >>
> >> *Scenario 2:*
> >>
> >> The *java -version* problem I recall we had only on the other cluster
> >> with centos 7, I did't seen it on this one with centos 6.7. But a
> >> change I made these days in the latter one is installing oracle *jdk
> >> 1.7.0_79* as default one and is where *JAVA_HOME* points to. Before
> >> that some nodes had *open-jdk* as default and others didn't have one
> >> but just the one installed by path by *ambari* in
> >> */usr/jdk64/jdk1.7.0_67* but which was not linked to JAVA_HOME or *java*
> >> command by *alternatives*.
> >>
> >> *Failures is HammerDB:*
> >>
> >> Attached is the *trafodion.dtm.**log* from a node on which I see a
> >> lot of lines like these and I assume is the *transaction conflict*
> >> that you mentioned, I see these line on 4 out of 5 nodes:
> >>
> >> 2015-09-14 12:21:49,413 INFO dtm.HBaseTxClient: useForgotten is true
> >> 2015-09-14 12:21:49,414 INFO dtm.HBaseTxClient: forceForgotten is
> >> false
> >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: forceControlPoint is
> >> false
> >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog: useAutoFlush is false
> >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: ageCommitted is false
> >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog: disableBlockCache is
> >> false
> >> 2015-09-14 12:21:52,229 INFO dtm.HBaseAuditControlPoint:
> >> disableBlockCache is false
> >> 2015-09-14 12:21:52,233 INFO dtm.HBaseAuditControlPoint: useAutoFlush
> >> is false
> >> 2015-09-14 12:42:57,346 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT
> >> prepareCommit, txid: 17179989222
> >> 2015-09-14 12:43:46,102 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT
> >> prepareCommit, txid: 17179989277
> >> 2015-09-14 12:44:11,598 INFO dtm.HBaseTxClient: Exit RET_HASCONFLICT
> >> prepareCommit, txid: 17179989309
> >>
> >> What *transaction conflict* means in this case?
> >>
> >> On Wed, Sep 16, 2015 at 2:43 AM, Selva Govindarajan <
> >> selva.govindarajan@esgyn.com> wrote:
> >>
> >>> Hi Radu,
> >>>
> >>> Thanks for using Trafodion. With the help from Suresh, we looked at
> >>> the core files in your cluster. We believe that there are two
> >>> scenarios that is causing the Trafodion processes to dump core.
> >>>
> >>> Scenario 1:
> >>> Core dumped by tdm_arkesp processes. Trafodion engine has assumed
> >>> the entity /proc/meminfo/Committed_AS is available in all flavors of
> >>> linux.  The absence of this entity is not handled correctly by the
> >>> trafodion tdm_arkesp process and hence it dumped core. Please file a
> >>> JIRA using this link
> >>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa and
> >>> choose "Apache Trafodion" as the project to report a bug against.
> >>>
> >>> Scenario 2:
> >>> Core dumped by tdm_udrserv processes. From our analysis, this
> >>> problem happened when the process attempted to create the JVM
> >>> instance programmatically. Few days earlier, we have observed
> >>> similar issue in your cluster when java -version command was
> >>> attempted. But, java -version or $JAVA_HOME/bin/java -version works
> >>> fine now.
> >>> Was there any change made to the cluster recently to avoid the
> >>> problem with java -version command?
> >>>
> >>> You can please delete all the core files in sql/scripts directory
> >>> and issue the command to invoke SPJ and check if it still dumps
> >>> core. We can look at the core file if it happens again. Your
> >>> solution to the java -version command would be helpful.
> >>>
> >>> For the failures with HammerDB, can you please send us the exact
> >>> error message returned by the Trafodion engine to the application.
> >>> This might help us to narrow down the cause. You can also look at
> >>> $MY_SQROOT/logs/trafodion.dtm.log to check if any transaction
> >>> conflict is causing this error.
> >>>
> >>> Selva
> >>> -----Original Message-----
> >>> From: Radu Marias [mailto:radumarias@gmail.com]
> >>> Sent: Tuesday, September 15, 2015 9:09 AM
> >>> To: dev <dev@trafodion.incubator.apache.org>
> >>> Subject: Re: odbc and/or hammerdb logs
> >>>
> >>> Also noticed there are several core. files from today in
> >>> */home/trafodion/trafodion-20150828_0830/sql/scripts*. If needed
> >>> please provide a gmail address so I can share them via gdrive.
> >>>
> >>> On Tue, Sep 15, 2015 at 6:29 PM, Radu Marias <radumarias@gmail.com>
> >>> wrote:
> >>>
> >>> > Hi,
> >>> >
> >>> > I'm running HammerDB over trafodion and when running virtual users
> >>> > sometimes I get errors like this in hammerdb logs:
> >>> > *Vuser 1:Failed to execute payment*
> >>> >
> >>> > *Vuser 1:Failed to execute new order*
> >>> >
> >>> > I'm using unixODBC and I tried to add these line in
> >>> > */etc/odbc.ini* but the trace file is not created.
> >>> > *[ODBC]*
> >>> > *Trace = 1*
> >>> > *TraceFile = /var/log/odbc_tracefile.log*
> >>> >
> >>> > Also tried with *Trace = yes* and *Trace = on*, I've found
> >>> > multiple references for both.
> >>> >
> >>> > How can I see more logs to debug the issue? Can I enable logs for
> >>> > all queries in trafodion?
> >>> >
> >>> > --
> >>> > And in the end, it's not the years in your life that count. It's
> >>> > the life in your years.
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> And in the end, it's not the years in your life that count. It's the
> >>> life in your years.
> >>>
> >>
> >>
> >>
> >> --
> >> And in the end, it's not the years in your life that count. It's the
> life
> >> in your years.
> >>
> >
> >
> >
> > --
> > And in the end, it's not the years in your life that count. It's the life
> > in your years.
> >
>
>
>
> --
> And in the end, it's not the years in your life that count. It's the life
> in your years.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message