trafodion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suresh Subbiah <suresh.subbia...@gmail.com>
Subject Re: odbc and/or hammerdb logs
Date Fri, 18 Sep 2015 15:05:32 GMT
Thank you.

Trafodion-1492 can be used to track both the esp crash and the udrserv
crash. The fixes will be in different areas, but that does not matter.

I don't know much about containers or openVZ. Maybe others will know. Hope
to have a fix ready soon for the udrserv crash problem, in case container
settings cannot be changed.

The general idea suggested by Selva is that we introduce env variables with
min and max jvm heap size settings for the udrserv process (just like we
have today for executor processes). Udrserv does have the idea of reading
from a configuration file, so we could use that approach if that is
preferable. Either way there should be some way to start a udrserv process
with a smaller heap soon.

Thanks
Suresh



On Fri, Sep 18, 2015 at 8:17 AM, Radu Marias <radumarias@gmail.com> wrote:

> The nodes are in OpenVZ containers an I noticed this:
>
> # cat /proc/user_beancounters
>             uid  resource                     held              maxheld
>          barrier                limit              failcnt
> *            privvmpages               6202505              9436485
>      9437184              9437184                 1573*
>
> I assume this could be related to java -version issue. Trying to see if I
> can fix this, we are limited on what can be set from inside the container.
>
> # cat /proc/user_beancounters
> Version: 2.5
>        uid  resource                     held              maxheld
>      barrier                limit              failcnt
>  10045785:  kmemsize                111794747            999153664
>  9223372036854775807  9223372036854775807                    0
>             lockedpages                  7970                 7970
>      6291456              6291456                    0
> *            privvmpages               6202505              9436485
>      9437184              9437184                 1573*
>             shmpages                    34617                36553
>  9223372036854775807  9223372036854775807                    0
>             dummy                           0                    0
>  9223372036854775807  9223372036854775807                    0
>             numproc                       952                 1299
>        30000                30000                    0
>             physpages                 1214672              6291456
>      6291456              6291456                    0
>             vmguarpages                     0                    0
>      6291456              6291456                    0
>             oomguarpages              1096587              2121834
>      6291456              6291456                    0
>             numtcpsock                    226                  457
>        30000                30000                    0
>             numflock                        5                   16
>         1000                 1100                    0
>             numpty                          4                    6
>          512                  512                    0
>             numsiginfo                      1                   69
>         1024                 1024                    0
>             tcpsndbuf                 5637456             17822864
>  9223372036854775807  9223372036854775807                    0
>             tcprcvbuf                 6061504             13730792
>  9223372036854775807  9223372036854775807                    0
>             othersockbuf                46240              1268016
>  9223372036854775807  9223372036854775807                    0
>             dgramrcvbuf                     0               436104
>  9223372036854775807  9223372036854775807                    0
>             numothersock                   89                  134
>        30000                30000                    0
>             dcachesize               61381173            935378121
>  9223372036854775807  9223372036854775807                    0
>             numfile                      7852                11005
>       250000               250000                    0
>             dummy                           0                    0
>  9223372036854775807  9223372036854775807                    0
>             dummy                           0                    0
>  9223372036854775807  9223372036854775807                    0
>             dummy                           0                    0
>  9223372036854775807  9223372036854775807                    0
>             numiptent                      38                   38
>         1000                 1000                    0
>
>
> On Fri, Sep 18, 2015 at 3:34 PM, Radu Marias <radumarias@gmail.com> wrote:
>
> > With 1 user no crash occurs, but on the node on which hammerdb is started
> > I noticed from time to time this:
> >
> > $ java -version
> > Error occurred during initialization of VM
> > Unable to allocate 199232KB bitmaps for parallel garbage collection for
> > the requested 6375424KB heap.
> > Error: Could not create the Java Virtual Machine.
> > Error: A fatal exception has occurred. Program will exit.
> >
> > $ free -h
> >              total       used       free     shared    buffers     cached
> > Mem:           24G       4.7G        19G       132M         0B       314M
> > -/+ buffers/cache:       4.4G        19G
> > Swap:           0B         0B         0B
> >
> >
> > On Fri, Sep 18, 2015 at 2:21 PM, Radu Marias <radumarias@gmail.com>
> wrote:
> >
> >> $ $JAVA_HOME/bin/java -XX:+PrintFlagsFinal -version | grep HeapSize
> >>     uintx ErgoHeapSizeLimit                         = 0
> >> {product}
> >>     uintx HeapSizePerGCThread                       = 87241520
> >>  {product}
> >>     uintx InitialHeapSize                          := 402653184
> >> {product}
> >>     uintx LargePageHeapSizeThreshold                = 134217728
> >> {product}
> >>     uintx MaxHeapSize                              := 6442450944
> >>  {product}
> >> java version "1.7.0_67"
> >> Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
> >> Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
> >>
> >>
> >> On Fri, Sep 18, 2015 at 2:20 PM, Radu Marias <radumarias@gmail.com>
> >> wrote:
> >>
> >>> I've logged this issue several days ago, is this ok?
> >>> https://issues.apache.org/jira/browse/TRAFODION-1492
> >>>
> >>> Will try with one user and let you know.
> >>>
> >>> On Fri, Sep 18, 2015 at 7:05 AM, Suresh Subbiah <
> >>> suresh.subbiah60@gmail.com> wrote:
> >>>
> >>>> Hi
> >>>>
> >>>> How many Virtual users are being used? If it is more than one could
we
> >>>> please try the case with 1 user first.
> >>>>
> >>>> When the crash happens next time could we please try
> >>>> sqps | grep esp | wc -l
> >>>>
> >>>> If this number is large we know a lot of esp processes are being
> started
> >>>> which could consume memory.
> >>>> If this is the case please insert this row into the defaults table
> from
> >>>> sqlci and the restart dcs (dcsstop followed by dcsstart)
> >>>> insert into "_MD_".defaults values('ATTEMPT_ESP_PARALLELISM', 'OFF',
> >>>> 'hammerdb testing') ;
> >>>> exit ;
> >>>>
> >>>> I will work having the udr process create a JVM with a smaller initial
> >>>> heap
> >>>> size. If you have time and would like to do so a, JIRA you file will
> be
> >>>> helpful. Or I can file the JIRA and work on it. It will not take long
> to
> >>>> make this change.
> >>>>
> >>>> Thanks
> >>>> Suresh
> >>>>
> >>>> PS I found this command from stackOverflow to determine the
> >>>> initialHeapSize
> >>>> we get by default in this env
> >>>>
> >>>> java -XX:+PrintFlagsFinal -version | grep HeapSize
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> http://stackoverflow.com/questions/4667483/how-is-the-default-java-heap-size-determined
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Sep 17, 2015 at 10:32 AM, Radu Marias <radumarias@gmail.com>
> >>>> wrote:
> >>>>
> >>>> > Did the steps mentioned above to ensure that the trafodion processes
> >>>> are
> >>>> > free of JAVA installation mixup.
> >>>> > Also changed so that hdp, trafodion and hammerdb uses the same
jdk
> >>>> from
> >>>> > */usr/jdk64/jdk1.7.0_67*
> >>>> >
> >>>> > # java -version
> >>>> > java version "1.7.0_67"
> >>>> > Java(TM) SE Runtime Environment (build 1.7.0_67-b01)
> >>>> > Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
> >>>> >
> >>>> > # echo $JAVA_HOME
> >>>> > /usr/jdk64/jdk1.7.0_67
> >>>> >
> >>>> > But when running hammerdb I got again a crash on 2 nodes. I noticed
> >>>> that
> >>>> > before the crash for about one minute I'm getting errors for *java
> >>>> > -version* and
> >>>> > about 30 seconds after the crash the java -version worked again.
So
> >>>> these
> >>>> > issues might be related. Didn't yet found the problem and how to
fix
> >>>> the
> >>>> > java -version issue.
> >>>> >
> >>>> > # java -version
> >>>> > Error occurred during initialization of VM
> >>>> > Could not reserve enough space for object heap
> >>>> > Error: Could not create the Java Virtual Machine.
> >>>> > Error: A fatal exception has occurred. Program will exit
> >>>> >
> >>>> > # file core.5813
> >>>> > core.5813: ELF 64-bit LSB core file x86-64, version 1 (SYSV),
> >>>> SVR4-style,
> >>>> > from 'tdm_udrserv SQMON1.1 00000 00000 005813 $Z0004R3
> >>>> > 188.138.61.175:48357
> >>>> > 00004 000'
> >>>> >
> >>>> > #0  0x00007f6920ba0625 in raise () from /lib64/libc.so.6
> >>>> > #1  0x00007f6920ba1e05 in abort () from /lib64/libc.so.6
> >>>> > #2  0x0000000000424369 in comTFDS (msg1=0x43c070 "Trafodion UDR
> Server
> >>>> > Internal Error", msg2=<value optimized out>, msg3=0x7fff119787f0
> >>>> "Source
> >>>> > file information unavailable",
> >>>> >     msg4=0x7fff11977ff0 "User routine being processed :
> >>>> > TRAFODION.TPCC.NEWORDER, Routine Type : Stored Procedure, Language
> >>>> Type :
> >>>> > JAVA, Error occurred outside the user routine code", msg5=0x43ddc3
> "",
> >>>> > dialOut=<value optimized out>, writeToSeaLog=1) at
> >>>> > ../udrserv/UdrFFDC.cpp:191
> >>>> > #3  0x00000000004245d7 in makeTFDSCall (msg=0x7f692324b310 "The
Java
> >>>> > virtual machine aborted", file=<value optimized out>, line=<value
> >>>> optimized
> >>>> > out>, dialOut=1, writeToSeaLog=1) at ../udrserv/UdrFFDC.cpp:219
> >>>> > #4  0x00007f69232316b8 in LmJavaHooks::abortHookJVM () at
> >>>> > ../langman/LmJavaHooks.cpp:54
> >>>> > #5  0x00007f69229cbbc6 in ParallelScavengeHeap::initialize() ()
from
> >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >>>> > #6  0x00007f6922afedba in Universe::initialize_heap() () from
> >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >>>> > #7  0x00007f6922afff89 in universe_init() () from
> >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >>>> > #8  0x00007f692273d9f5 in init_globals() () from
> >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >>>> > #9  0x00007f6922ae78ed in Threads::create_vm(JavaVMInitArgs*, bool*)
> >>>> ()
> >>>> > from /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >>>> > #10 0x00007f69227c5a34 in JNI_CreateJavaVM () from
> >>>> > /usr/jdk64/jdk1.7.0_67/jre/lib/amd64/server/libjvm.so
> >>>> > #11 0x00007f692322de51 in LmLanguageManagerJava::initialize
> >>>> (this=<value
> >>>> > optimized out>, result=<value optimized out>, maxLMJava=<value
> >>>> optimized
> >>>> > out>, userOptions=0x7f69239ba418, diagsArea=<value optimized
out>)
> at
> >>>> > ../langman/LmLangManagerJava.cpp:379
> >>>> > #12 0x00007f692322f564 in
> LmLanguageManagerJava::LmLanguageManagerJava
> >>>> > (this=0x7f69239bec38, result=@0x7fff1197e19c, commandLineMode=<value
> >>>> > optimized out>, maxLMJava=1, userOptions=0x7f69239ba418,
> >>>> > diagsArea=0x7f6923991780) at ../langman/LmLangManagerJava.cpp:155
> >>>> > #13 0x0000000000425619 in UdrGlobals::getOrCreateJavaLM
> >>>> > (this=0x7f69239ba040, result=@0x7fff1197e19c, diags=<value optimized
> >>>> out>)
> >>>> > at ../udrserv/udrglobals.cpp:322
> >>>> > #14 0x0000000000427328 in processALoadMessage
> (UdrGlob=0x7f69239ba040,
> >>>> > msgStream=..., request=..., env=<value optimized out>) at
> >>>> > ../udrserv/udrload.cpp:163
> >>>> > #15 0x000000000042fbfd in processARequest (UdrGlob=0x7f69239ba040,
> >>>> > msgStream=..., env=...) at ../udrserv/udrserv.cpp:660
> >>>> > #16 0x000000000043269c in runServer (argc=2, argv=0x7fff1197e528)
at
> >>>> > ../udrserv/udrserv.cpp:520
> >>>> > #17 0x000000000043294e in main (argc=2, argv=0x7fff1197e528) at
> >>>> > ../udrserv/udrserv.cpp:356
> >>>> >
> >>>> > On Wed, Sep 16, 2015 at 6:03 PM, Suresh Subbiah <
> >>>> > suresh.subbiah60@gmail.com>
> >>>> > wrote:
> >>>> >
> >>>> > > Hi,
> >>>> > >
> >>>> > > I have added a wiki page that describes how to get a stack
trace
> >>>> from a
> >>>> > > core file. The page could do with some improvements on finding
the
> >>>> core
> >>>> > > file and maybe even doing more than getting thestack trace.
For
> now
> >>>> it
> >>>> > > should make our troubleshooting cycle faster if the stack
trace is
> >>>> > included
> >>>> > > in the initial message itself.
> >>>> > >
> >>>> > >
> >>>> >
> >>>>
> https://cwiki.apache.org/confluence/display/TRAFODION/Obtain+stack+trace+from+a+core+file
> >>>> > >
> >>>> > > In this case, the last node does not seem to have gdb, so
I could
> >>>> not see
> >>>> > > the trace there. I moved the core file to the first node but
then
> >>>> the
> >>>> > trace
> >>>> > > looks like this. I assume this is because I moved the core
file
> to a
> >>>> > > different node. I think Selva's suggestion is good to try.
We may
> >>>> have
> >>>> > had
> >>>> > > a few  tdm_udrserv processes from before the time the java
change
> >>>> was
> >>>> > made.
> >>>> > >
> >>>> > > $ gdb tdm_udrserv core.49256
> >>>> > > #0  0x00007fe187a674fe in __longjmp () from /lib64/libc.so.6
> >>>> > > #1  0x8857780a58ff2155 in ?? ()
> >>>> > > Cannot access memory at address 0x8857780a58ff2155
> >>>> > >
> >>>> > > The back trace we saw yesterday when a udrserv process exited
when
> >>>> JVM
> >>>> > > could not be started is used in the wiki page instead of this
one.
> >>>> If you
> >>>> > > have time a JIRA on this unexpected udrserv exit will also
be
> >>>> valuable
> >>>> > for
> >>>> > > the Trafodion team.
> >>>> > >
> >>>> > > Thanks
> >>>> > > Suresh
> >>>> > >
> >>>> > > On Wed, Sep 16, 2015 at 8:39 AM, Selva Govindarajan <
> >>>> > > selva.govindarajan@esgyn.com> wrote:
> >>>> > >
> >>>> > > > Thanks for creating the JIRA Trafodion-1492.  The error
is
> >>>> similar to
> >>>> > > > scenario-2. The process tdm_udrserv dumped core. We will
look
> >>>> into the
> >>>> > > core
> >>>> > > > file. In the meantime, can you please do the following:
> >>>> > > >
> >>>> > > > Bring the Trafodion instance down
> >>>> > > > echo $MY_SQROOT -- shows Trafodion installation directory
> >>>> > > > Remove $MY_SQROOT/etc/ms.env from all nodes
> >>>> > > >
> >>>> > > >
> >>>> > > > Start a New Terminal Session so that new Java settings
are in
> >>>> place
> >>>> > > > Login as a Trafodion user
> >>>> > > > cd <trafodion_installation_directory>
> >>>> > > > . ./sqenv.sh  (skip this if it is done automatically
upon logon)
> >>>> > > > sqgen
> >>>> > > >
> >>>> > > > Exit and Start a New Terminal Session
> >>>> > > > Restart the Trafodion instance and check if you are seeing
the
> >>>> issue
> >>>> > with
> >>>> > > > tdm_udrserv again. We wanted to ensure that the trafodion
> >>>> processes are
> >>>> > > > free
> >>>> > > > of JAVA installation mixup in your earlier message. We
suspect
> >>>> that can
> >>>> > > > cause tdm_udrserv process  to dump core.
> >>>> > > >
> >>>> > > >
> >>>> > > > Selva
> >>>> > > >
> >>>> > > > -----Original Message-----
> >>>> > > > From: Radu Marias [mailto:radumarias@gmail.com]
> >>>> > > > Sent: Wednesday, September 16, 2015 5:40 AM
> >>>> > > > To: dev <dev@trafodion.incubator.apache.org>
> >>>> > > > Subject: Re: odbc and/or hammerdb logs
> >>>> > > >
> >>>> > > > I'm seeing this in hammerdb logs, I assume is due to
the crash
> >>>> and some
> >>>> > > > processes are stopped:
> >>>> > > >
> >>>> > > > Error in Virtual User 1: [Trafodion ODBC Driver][Trafodion
> >>>> Database]
> >>>> > SQL
> >>>> > > > ERROR:*** ERROR[2034] $Z0106BZ:16: Operating system error
201
> >>>> while
> >>>> > > > communicating with server process $Z010LPE:23. [2015-09-16
> >>>> 12:35:33]
> >>>> > > > [Trafodion ODBC Driver][Trafodion Database] SQL ERROR:***
> >>>> ERROR[8904]
> >>>> > SQL
> >>>> > > > did not receive a reply from MXUDR, possibly caused by
internal
> >>>> errors
> >>>> > > when
> >>>> > > > executing user-defined routines. [2015-09-16 12:35:33]
> >>>> > > >
> >>>> > > > $ sqcheck
> >>>> > > > Checking if processes are up.
> >>>> > > > Checking attempt: 1; user specified max: 2. Execution
time in
> >>>> seconds:
> >>>> > 0.
> >>>> > > >
> >>>> > > > The SQ environment is up!
> >>>> > > >
> >>>> > > >
> >>>> > > > Process         Configured      Actual      Down
> >>>> > > > -------         ----------      ------      ----
> >>>> > > > DTM             5               5
> >>>> > > > RMS             10              10
> >>>> > > > MXOSRVR         20              20
> >>>> > > >
> >>>> > > > On Wed, Sep 16, 2015 at 3:28 PM, Radu Marias <
> >>>> radumarias@gmail.com>
> >>>> > > wrote:
> >>>> > > >
> >>>> > > > > I've restarted hdp and trafodion and now I managed
to create
> the
> >>>> > > > > schema and stored procedures from hammerdb. But
I'm getting
> >>>> fails and
> >>>> > > > > dump core again by trafodion while running virtual
users. For
> >>>> some of
> >>>> > > > > the users I sometimes see in hammerdb logs:
> >>>> > > > > Vuser 5:Failed to execute payment
> >>>> > > > > Vuser 5:Failed to execute stock level
> >>>> > > > > Vuser 5:Failed to execute new order
> >>>> > > > >
> >>>> > > > > Core files are on out last node, feel free to examine
them,
> the
> >>>> files
> >>>> > > > > were dumped while getting hammerdb errors:
> >>>> > > > >
> >>>> > > > > *core.49256*
> >>>> > > > >
> >>>> > > > > *core.48633*
> >>>> > > > >
> >>>> > > > > *core.49290*
> >>>> > > > >
> >>>> > > > >
> >>>> > > > > On Wed, Sep 16, 2015 at 3:24 PM, Radu Marias <
> >>>> radumarias@gmail.com>
> >>>> > > > wrote:
> >>>> > > > >
> >>>> > > > >> *Scenario 1:*
> >>>> > > > >>
> >>>> > > > >> I've created this issue
> >>>> > > > >> https://issues.apache.org/jira/browse/TRAFODION-1492
> >>>> > > > >> I think another fix was made related to *Committed_AS*
in
> >>>> > > > >> *sql/cli/memmonitor.cpp*.
> >>>> > > > >>
> >>>> > > > >> This is a response from Narendra in a previous
thread where
> the
> >>>> > issue
> >>>> > > > >> was fixed to start the trafodion:
> >>>> > > > >>
> >>>> > > > >>
> >>>> > > > >>>
> >>>> > > > >>>
> >>>> > > > >>>
> >>>> > > > >>> *I updated the code: sql/cli/memmonitor.cpp,
so that if
> >>>> > > > >>> /proc/meminfo does not have the ‘Committed_AS’
entry, it
> will
> >>>> > ignore
> >>>> > > > >>> it. Built it and put the binary: libcli.so
on the veracity
> >>>> box (in
> >>>> > > > >>> the $MY_SQROOT/export/lib64 directory –
on all the nodes).
> >>>> > Restarted
> >>>> > > > the
> >>>> > > > >>> env and ‘sqlci’ worked fine.
> >>>> > > > >>> Was able to ‘initialize trafodion’ and
create a table.*
> >>>> > > > >>
> >>>> > > > >>
> >>>> > > > >> *Scenario 2:*
> >>>> > > > >>
> >>>> > > > >> The *java -version* problem I recall we had
only on the other
> >>>> > cluster
> >>>> > > > >> with centos 7, I did't seen it on this one with
centos 6.7.
> >>>> But a
> >>>> > > > >> change I made these days in the latter one is
installing
> >>>> oracle *jdk
> >>>> > > > >> 1.7.0_79* as default one and is where *JAVA_HOME*
points to.
> >>>> Before
> >>>> > > > >> that some nodes had *open-jdk* as default and
others didn't
> >>>> have one
> >>>> > > > >> but just the one installed by path by *ambari*
in
> >>>> > > > >> */usr/jdk64/jdk1.7.0_67* but which was not linked
to
> JAVA_HOME
> >>>> or
> >>>> > > *java*
> >>>> > > > >> command by *alternatives*.
> >>>> > > > >>
> >>>> > > > >> *Failures is HammerDB:*
> >>>> > > > >>
> >>>> > > > >> Attached is the *trafodion.dtm.**log* from a
node on which I
> >>>> see a
> >>>> > > > >> lot of lines like these and I assume is the
*transaction
> >>>> conflict*
> >>>> > > > >> that you mentioned, I see these line on 4 out
of 5 nodes:
> >>>> > > > >>
> >>>> > > > >> 2015-09-14 12:21:49,413 INFO dtm.HBaseTxClient:
useForgotten
> >>>> is true
> >>>> > > > >> 2015-09-14 12:21:49,414 INFO dtm.HBaseTxClient:
> forceForgotten
> >>>> is
> >>>> > > > >> false
> >>>> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog:
> >>>> forceControlPoint is
> >>>> > > > >> false
> >>>> > > > >> 2015-09-14 12:21:49,446 INFO dtm.TmAuditTlog:
useAutoFlush is
> >>>> false
> >>>> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog:
ageCommitted is
> >>>> false
> >>>> > > > >> 2015-09-14 12:21:49,447 INFO dtm.TmAuditTlog:
> >>>> disableBlockCache is
> >>>> > > > >> false
> >>>> > > > >> 2015-09-14 12:21:52,229 INFO dtm.HBaseAuditControlPoint:
> >>>> > > > >> disableBlockCache is false
> >>>> > > > >> 2015-09-14 12:21:52,233 INFO dtm.HBaseAuditControlPoint:
> >>>> > useAutoFlush
> >>>> > > > >> is false
> >>>> > > > >> 2015-09-14 12:42:57,346 INFO dtm.HBaseTxClient:
Exit
> >>>> RET_HASCONFLICT
> >>>> > > > >> prepareCommit, txid: 17179989222
> >>>> > > > >> 2015-09-14 12:43:46,102 INFO dtm.HBaseTxClient:
Exit
> >>>> RET_HASCONFLICT
> >>>> > > > >> prepareCommit, txid: 17179989277
> >>>> > > > >> 2015-09-14 12:44:11,598 INFO dtm.HBaseTxClient:
Exit
> >>>> RET_HASCONFLICT
> >>>> > > > >> prepareCommit, txid: 17179989309
> >>>> > > > >>
> >>>> > > > >> What *transaction conflict* means in this case?
> >>>> > > > >>
> >>>> > > > >> On Wed, Sep 16, 2015 at 2:43 AM, Selva Govindarajan
<
> >>>> > > > >> selva.govindarajan@esgyn.com> wrote:
> >>>> > > > >>
> >>>> > > > >>> Hi Radu,
> >>>> > > > >>>
> >>>> > > > >>> Thanks for using Trafodion. With the help
from Suresh, we
> >>>> looked at
> >>>> > > > >>> the core files in your cluster. We believe
that there are
> two
> >>>> > > > >>> scenarios that is causing the Trafodion
processes to dump
> >>>> core.
> >>>> > > > >>>
> >>>> > > > >>> Scenario 1:
> >>>> > > > >>> Core dumped by tdm_arkesp processes. Trafodion
engine has
> >>>> assumed
> >>>> > > > >>> the entity /proc/meminfo/Committed_AS is
available in all
> >>>> flavors
> >>>> > of
> >>>> > > > >>> linux.  The absence of this entity is not
handled correctly
> >>>> by the
> >>>> > > > >>> trafodion tdm_arkesp process and hence it
dumped core.
> Please
> >>>> file
> >>>> > a
> >>>> > > > >>> JIRA using this link
> >>>> > > > >>>
> >>>> https://issues.apache.org/jira/secure/CreateIssue!default.jspa and
> >>>> > > > >>> choose "Apache Trafodion" as the project
to report a bug
> >>>> against.
> >>>> > > > >>>
> >>>> > > > >>> Scenario 2:
> >>>> > > > >>> Core dumped by tdm_udrserv processes. From
our analysis,
> this
> >>>> > > > >>> problem happened when the process attempted
to create the
> JVM
> >>>> > > > >>> instance programmatically. Few days earlier,
we have
> observed
> >>>> > > > >>> similar issue in your cluster when java
-version command was
> >>>> > > > >>> attempted. But, java -version or $JAVA_HOME/bin/java
> -version
> >>>> works
> >>>> > > > >>> fine now.
> >>>> > > > >>> Was there any change made to the cluster
recently to avoid
> the
> >>>> > > > >>> problem with java -version command?
> >>>> > > > >>>
> >>>> > > > >>> You can please delete all the core files
in sql/scripts
> >>>> directory
> >>>> > > > >>> and issue the command to invoke SPJ and
check if it still
> >>>> dumps
> >>>> > > > >>> core. We can look at the core file if it
happens again. Your
> >>>> > > > >>> solution to the java -version command would
be helpful.
> >>>> > > > >>>
> >>>> > > > >>> For the failures with HammerDB, can you
please send us the
> >>>> exact
> >>>> > > > >>> error message returned by the Trafodion
engine to the
> >>>> application.
> >>>> > > > >>> This might help us to narrow down the cause.
You can also
> >>>> look at
> >>>> > > > >>> $MY_SQROOT/logs/trafodion.dtm.log to check
if any
> transaction
> >>>> > > > >>> conflict is causing this error.
> >>>> > > > >>>
> >>>> > > > >>> Selva
> >>>> > > > >>> -----Original Message-----
> >>>> > > > >>> From: Radu Marias [mailto:radumarias@gmail.com]
> >>>> > > > >>> Sent: Tuesday, September 15, 2015 9:09 AM
> >>>> > > > >>> To: dev <dev@trafodion.incubator.apache.org>
> >>>> > > > >>> Subject: Re: odbc and/or hammerdb logs
> >>>> > > > >>>
> >>>> > > > >>> Also noticed there are several core. files
from today in
> >>>> > > > >>> */home/trafodion/trafodion-20150828_0830/sql/scripts*.
If
> >>>> needed
> >>>> > > > >>> please provide a gmail address so I can
share them via
> gdrive.
> >>>> > > > >>>
> >>>> > > > >>> On Tue, Sep 15, 2015 at 6:29 PM, Radu Marias
<
> >>>> radumarias@gmail.com
> >>>> > >
> >>>> > > > >>> wrote:
> >>>> > > > >>>
> >>>> > > > >>> > Hi,
> >>>> > > > >>> >
> >>>> > > > >>> > I'm running HammerDB over trafodion
and when running
> virtual
> >>>> > users
> >>>> > > > >>> > sometimes I get errors like this in
hammerdb logs:
> >>>> > > > >>> > *Vuser 1:Failed to execute payment*
> >>>> > > > >>> >
> >>>> > > > >>> > *Vuser 1:Failed to execute new order*
> >>>> > > > >>> >
> >>>> > > > >>> > I'm using unixODBC and I tried to add
these line in
> >>>> > > > >>> > */etc/odbc.ini* but the trace file
is not created.
> >>>> > > > >>> > *[ODBC]*
> >>>> > > > >>> > *Trace = 1*
> >>>> > > > >>> > *TraceFile = /var/log/odbc_tracefile.log*
> >>>> > > > >>> >
> >>>> > > > >>> > Also tried with *Trace = yes* and *Trace
= on*, I've found
> >>>> > > > >>> > multiple references for both.
> >>>> > > > >>> >
> >>>> > > > >>> > How can I see more logs to debug the
issue? Can I enable
> >>>> logs for
> >>>> > > > >>> > all queries in trafodion?
> >>>> > > > >>> >
> >>>> > > > >>> > --
> >>>> > > > >>> > And in the end, it's not the years
in your life that
> count.
> >>>> It's
> >>>> > > > >>> > the life in your years.
> >>>> > > > >>> >
> >>>> > > > >>>
> >>>> > > > >>>
> >>>> > > > >>>
> >>>> > > > >>> --
> >>>> > > > >>> And in the end, it's not the years in your
life that count.
> >>>> It's
> >>>> > the
> >>>> > > > >>> life in your years.
> >>>> > > > >>>
> >>>> > > > >>
> >>>> > > > >>
> >>>> > > > >>
> >>>> > > > >> --
> >>>> > > > >> And in the end, it's not the years in your life
that count.
> >>>> It's the
> >>>> > > > life
> >>>> > > > >> in your years.
> >>>> > > > >>
> >>>> > > > >
> >>>> > > > >
> >>>> > > > >
> >>>> > > > > --
> >>>> > > > > And in the end, it's not the years in your life
that count.
> >>>> It's the
> >>>> > > life
> >>>> > > > > in your years.
> >>>> > > > >
> >>>> > > >
> >>>> > > >
> >>>> > > >
> >>>> > > > --
> >>>> > > > And in the end, it's not the years in your life that
count. It's
> >>>> the
> >>>> > life
> >>>> > > > in your years.
> >>>> > > >
> >>>> > >
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > And in the end, it's not the years in your life that count. It's
the
> >>>> life
> >>>> > in your years.
> >>>> >
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> And in the end, it's not the years in your life that count. It's the
> >>> life in your years.
> >>>
> >>
> >>
> >>
> >> --
> >> And in the end, it's not the years in your life that count. It's the
> life
> >> in your years.
> >>
> >
> >
> >
> > --
> > And in the end, it's not the years in your life that count. It's the life
> > in your years.
> >
>
>
>
> --
> And in the end, it's not the years in your life that count. It's the life
> in your years.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message