metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: Build failing
Date Tue, 24 Jan 2017 13:09:09 GMT
One thing that I would caution though is that this is likely a heisenbug.
The more logging I added earlier made it less likely to occur. It seems
more likely to occur on Travis than locally and I made it happen by
repeatedly running mvn install on Metron-solr (after a mvn install of the
whole project).
On Tue, Jan 24, 2017 at 07:59 Casey Stella <cestella@gmail.com> wrote:

> Agreed to both counts. I was able to reproduce it locally, but not in an
> IDE by the way.
> On Tue, Jan 24, 2017 at 07:57 Justin Leet <justinjleet@gmail.com> wrote:
>
> I definitely agree that this isn't a fluke.
>
> Do we have a Jira for this?  If not, I can create one and I would like to
> propose that part of that ticket is adding logging.  Right now, I'm
> concerned we don't have enough info from the Travis builds to be able to
> (easily) debug failure or reproduce locally.
>
> Justin
>
> On Mon, Jan 23, 2017 at 4:16 PM, Casey Stella <cestella@gmail.com> wrote:
>
> > One more thing, just for posterity here, it always freezes at 6 records
> > written to HDFS.  That's the reason I thought it was a flushing issue.
> >
> > On Mon, Jan 23, 2017 at 3:38 PM, Casey Stella <cestella@gmail.com>
> wrote:
> >
> > > Ok, so now I'm concerned that this isn't a fluke.  Here's an excerpt
> from
> > > the failing logs on travis for my PR with substantially longer
> timeouts (
> > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/194575474/log.txt)
> > >
> > > Running org.apache.metron.solr.integration.SolrIndexingIntegrationTest
> > > 0 vs 10 vs 0
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Processed target/indexingIntegrationTest/hdfs/test/enrichment-null-0-0-
> > 1485200689038.json
> > > 10 vs 10 vs 6
> > > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 317.056
> > sec <<< FAILURE!
> > > test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> > Time elapsed: 316.949 sec  <<< ERROR!
> > > java.lang.RuntimeException: Took too long to complete: 300783 > 300000
> > >       at org.apache.metron.integration.ComponentRunner.process(
> > ComponentRunner.java:131)
> > >       at org.apache.metron.indexing.integration.
> > IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
> > >
> > >
> > > I'm getting the impression that this isn't the timeout and we have a
> > mystery on our hands.  Each of those lines "10 vs 10 vs 6" happen 15
> > seconds apart.  That line means that it read 10 entries from kafka, 10
> > entries from the indexed data and 6 entries from HDFS.  It's that 6
> entries
> > that is the problem.   Also of note, this does not seem to happen to me
> > locally AND it's not consistent on Travis.  Given all that I'd say that
> > it's a problem with the HDFS Writer not getting flushed, but I verified
> > that it is indeed flushed per message.
> > >
> > >
> > > Anyway, tl;dr we have a mystery unit test bug that isn't deterministic
> > wrt the unit tests and may or may not manifest itself outside of the unit
> > tests.  So, yeah, I'll be looking at it, but would appreciate others
> taking
> > a gander too.
> > >
> > >
> > > Casey
> > >
> > >
> > > On Mon, Jan 23, 2017 at 2:09 PM, Casey Stella <cestella@gmail.com>
> > wrote:
> > >
> > >> Yeah, I adjusted the timeout on the indexing integration tests as part
> > of
> > >> https://github.com/apache/incubator-metron/pull/420 which I'll merge
> in
> > >> today.
> > >>
> > >> On Mon, Jan 23, 2017 at 2:01 PM, Zeolla@GMail.com <zeolla@gmail.com>
> > >> wrote:
> > >>
> > >>> Okay, now we have back to back failures, and it looks like it may
> have
> > >>> been
> > >>> a timeout issue?
> > >>>  `test(org.apache.metron.solr.integration.
> > SolrIndexingIntegrationTest):
> > >>> Took too long to complete: 150582 > 150000`, more details below:
> > >>>
> > >>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> > >>> 166.167 sec <<< FAILURE!
> > >>>
> > >>> test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> > >>> Time elapsed: 166.071 sec  <<< ERROR!
> > >>>
> > >>> java.lang.RuntimeException: Took too long to complete: 150582 >
> 150000
> > >>>
> > >>>         at org.apache.metron.integration.
> > ComponentRunner.process(Compon
> > >>> entRunner.java:131)
> > >>>
> > >>>         at org.apache.metron.indexing.integration.
> > IndexingIntegrationTe
> > >>> st.test(IndexingIntegrationTest.java:173)
> > >>>
> > >>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> > >>>
> > >>>         at sun.reflect.NativeMethodAccessorImpl.
> > invoke(NativeMethodAcce
> > >>> ssorImpl.java:62)
> > >>>
> > >>>         at sun.reflect.DelegatingMethodAccessorImpl.
> > invoke(DelegatingMe
> > >>> thodAccessorImpl.java:43)
> > >>>
> > >>>         at java.lang.reflect.Method.invoke(Method.java:483)
> > >>>
> > >>>         at org.junit.runners.model.FrameworkMethod$1.
> > runReflectiveCall(
> > >>> FrameworkMethod.java:50)
> > >>>
> > >>>         at org.junit.internal.runners.model.ReflectiveCallable.run(
> > Refl
> > >>> ectiveCallable.java:12)
> > >>>
> > >>>         at org.junit.runners.model.FrameworkMethod.
> > invokeExplosively(Fr
> > >>> ameworkMethod.java:47)
> > >>>
> > >>>         at org.junit.internal.runners.statements.InvokeMethod.
> > evaluate(
> > >>> InvokeMethod.java:17)
> > >>>
> > >>>         at org.junit.runners.ParentRunner.runLeaf(
> > ParentRunner.java:325)
> > >>>
> > >>>         at org.junit.runners.BlockJUnit4ClassRunner.
> > runChild(BlockJUnit
> > >>> 4ClassRunner.java:78)
> > >>>
> > >>>         at org.junit.runners.BlockJUnit4ClassRunner.
> > runChild(BlockJUnit
> > >>> 4ClassRunner.java:57)
> > >>>
> > >>>         at
> org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> > >>>
> > >>>         at org.junit.runners.ParentRunner$1.schedule(
> > ParentRunner.java:
> > >>> 71)
> > >>>
> > >>>         at org.junit.runners.ParentRunner.runChildren(
> > ParentRunner.java
> > >>> :288)
> > >>>
> > >>>         at org.junit.runners.ParentRunner.access$000(
> > ParentRunner.java:
> > >>> 58)
> > >>>
> > >>>         at org.junit.runners.ParentRunner$2.evaluate(
> > ParentRunner.java:
> > >>> 268)
> > >>>
> > >>>         at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> > >>>
> > >>>         at org.apache.maven.surefire.junit4.JUnit4Provider.execute(
> > JUni
> > >>> t4Provider.java:252)
> > >>>
> > >>>         at org.apache.maven.surefire.junit4.JUnit4Provider.
> > executeTestS
> > >>> et(JUnit4Provider.java:141)
> > >>>
> > >>>         at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(
> > JUnit
> > >>> 4Provider.java:112)
> > >>>
> > >>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> > >>>
> > >>>         at sun.reflect.NativeMethodAccessorImpl.
> > invoke(NativeMethodAcce
> > >>> ssorImpl.java:62)
> > >>>
> > >>>         at sun.reflect.DelegatingMethodAccessorImpl.
> > invoke(DelegatingMe
> > >>> thodAccessorImpl.java:43)
> > >>>
> > >>>         at java.lang.reflect.Method.invoke(Method.java:483)
> > >>>
> > >>>         at org.apache.maven.surefire.util.ReflectionUtils.
> > invokeMethodW
> > >>> ithArray(ReflectionUtils.java:189)
> > >>>
> > >>>         at org.apache.maven.surefire.booter.ProviderFactory$
> > ProviderPro
> > >>> xy.invoke(ProviderFactory.java:165)
> > >>>
> > >>>         at org.apache.maven.surefire.booter.ProviderFactory.
> > invokeProvi
> > >>> der(ProviderFactory.java:85)
> > >>>
> > >>>         at org.apache.maven.surefire.booter.ForkedBooter.
> > runSuitesInPro
> > >>> cess(ForkedBooter.java:115)
> > >>>
> > >>>         at org.apache.maven.surefire.booter.ForkedBooter.main(
> > ForkedBoo
> > >>> ter.java:75)
> > >>>
> > >>>
> > >>> Jon
> > >>>
> > >>> On Thu, Jan 19, 2017 at 2:49 PM Zeolla@GMail.com <zeolla@gmail.com>
> > >>> wrote:
> > >>>
> > >>> > The build has been showing as failing
> > >>> > <https://github.com/apache/incubator-metron> for a little
while
> now.
> > >>> I
> > >>> > know we recently updated the language around Merge Requirements
> > >>> > <https://cwiki.apache.org/confluence/pages/viewpage.action?p
> > >>> ageId=61332235>,
> > >>> > but if I recall properly our current issue is simply a Travis
CI
> > >>> overload
> > >>> > issue.  Is there a way we can update the wiki doc to account for
> > >>> situations
> > >>> > like this?
> > >>> >
> > >>> > Jon
> > >>> > --
> > >>> >
> > >>> > Jon
> > >>> >
> > >>> > Sent from my mobile device
> > >>> >
> > >>> --
> > >>>
> > >>> Jon
> > >>>
> > >>> Sent from my mobile device
> > >>>
> > >>
> > >>
> > >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message