hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lisen Mu <imm...@gmail.com>
Subject Re: random failure on tests
Date Wed, 29 May 2013 10:02:52 GMT
Nicolas,

Thanks for the reply!

I run small & medium tests only, and I have 8G ram on the build server. My
passing rate is also around 50%, but I'm not running -PrunAllTests.

I can find 2 differences between official build and mine: missing prebuild
step 'rm -rf /tmp/hbase-jenkins/hbase' and stack size is 1024 vs 8192 in
ulimit. Could this be a problem? I would try out anyway. Thanks!





On Wed, May 29, 2013 at 3:12 PM, Nicolas Liochon <nkeywal@gmail.com> wrote:

> Hello,
>
> Option 1:
> We still have some flaky tests. You can benchmark you build against
> https://builds.apache.org/job/HBase-TRUNK/ and
> https://builds.apache.org/job/hbase-0.95/
> You can also use this tool: https://github.com/jeffreyz88/jenkins-tools to
> get a review on the last fails:
>
> On 0.95, we have some tests that failed ~15% of the time:
>
> org.apache.hadoop.hbase.client.testadmin.testforcesplitmultifamily
> 80/6/13     1    1    1    1    1    1    1    1    1    1    1    1
> -1    0   -1
> org.apache.hadoop.hbase.client.testhcm.testdeleteforzkconnleak 86/6/6
> 1   -1    0    1    1    1    1    1    1    1    1    1    1    1    1
> org.apache.hadoop.hbase.client.testmultiparallel.testactivethreadscount
> 86/6/6     1    1    1    1    1    1    1   -1    0    1    1    1    1
> 1    1
> org.apache.hadoop.hbase.client.testmultiparallel.testflushcommitswithabort
> 86/6/6     1    1    1    1    1    1    1   -1    0    1    1    1    1
> 1    1
>
> org.apache.hadoop.hbase.replication.testreplicationqueuefailover.queuefailover
> 86/6/6     1    1    1    1    1    1    1    1    1    1   -1    0    1
> 1    1
> org.apache.hadoop.hbase.rest.client.testremoteadmin.testclusterstatus
> 86/6/6     1    1    1    1    1    1    1    1    1    1    1    1   -1
> 0    1
>
> org.apache.hadoop.hbase.security.access.testaccesscontroller.testglobalauthorizationfornewregisteredrs
> 86/6/6     1    1    1    1    1    1    1   -1    0    1    1    1    1
> 1    1
> org.apache.hadoop.hbase.util.testhbasefsck.testsplitdaughtersnotinmeta
> 86/6/6
>
>
> Option 2:
> You've got some issues in you env. We run the test in parallel (5 b default
> when you run all tests, 2 when you run only the small & medium ones). 5
> requires around 10 GB or memory. If you have less or if the built is
> shared, you may enter into strange conditions around test timing
> requirements.
> You also need to use oracle jdk.
>
> See as well http://hbase.apache.org/book.html / 15.7.3. Running tests, for
> extra parameters.
>
>
> These options are not exclusive. It seems that these days trunk build is ok
> ~80% of the time these days, and 0.95 50%. You should expect something
> similar.
>
>
> A test becomes flaky because a patch breaks it just a little. The patch
> passes the peer review and the precommit runs, but after a while the
> randomness shows up, and we need to fix the code again. It's a never ending
> story. Any help in fixing them is always greatly appreciated.
>
> Cheers,
>
> Nicolas
>
>
> On Wed, May 29, 2013 at 8:21 AM, Lisen Mu <immars@gmail.com> wrote:
>
> > Hello,
> >
> > I'm setting up a jenkins job for hbase, building branch 0.95 (from
> github)
> > under jdk 6.
> >
> > However sometimes the build pasts, sometimes does not. Several tests are
> > likely to fail, such as:
> >
> >
> >
> org.apache.hadoop.hbase.ipc.TestDelayedRpc.testDelayedRpcImmediateReturnValue
> > org.apache.hadoop.hbase.ipc.TestDelayedRpc.testTooManyDelayedRpcs
> > org.apache.hadoop.hbase.client.TestHCM.testDeleteForZKConnLeak
> >
> > yet they do not fail every time.
> >
> > Any clue about what might be the problem? Thanks.
> >
> >
> > From the last failed build:
> >
> > the executed mvn command line:
> >
> > Executing Maven:  -B -f
> > /var/lib/jenkins/jobs/HBase-0.95-jdk-6/workspace/pom.xml clean package
> >
> >
> >
> > Possibly related log:
> >
> >
> >
> org.apache.hadoop.hbase.ipc.TestDelayedRpc.testDelayedRpcImmediateReturnValue
> >
> > Error Message
> >
> > Index: 1, Size: 1
> >
> > Stacktrace
> >
> > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
> >         at java.util.ArrayList.RangeCheck(ArrayList.java:547)
> >         at java.util.ArrayList.get(ArrayList.java:322)
> >         at
> >
> org.apache.hadoop.hbase.ipc.TestDelayedRpc.testDelayedRpc(TestDelayedRpc.java:112)
> >         at
> >
> org.apache.hadoop.hbase.ipc.TestDelayedRpc.testDelayedRpcImmediateReturnValue(TestDelayedRpc.java:71)
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at
> >
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> >         at
> >
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> >         at
> >
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> >         at
> >
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> >         at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> >         at
> >
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> >         at
> >
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> >         at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> >         at
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> >         at
> > org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> >         at
> org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> >         at
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> >         at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> >         at org.junit.runners.Suite.runChild(Suite.java:127)
> >         at org.junit.runners.Suite.runChild(Suite.java:26)
> >         at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> >         at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >         at java.lang.Thread.run(Thread.java:662)
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message