metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Leet <justinjl...@gmail.com>
Subject Re: Build failing
Date Tue, 24 Jan 2017 13:28:21 GMT
Created a Jira:
https://issues.apache.org/jira/browse/METRON-672

Feel free to add / correct anything in that ticket.

Justin

On Tue, Jan 24, 2017 at 8:09 AM, Casey Stella <cestella@gmail.com> wrote:

> One thing that I would caution though is that this is likely a heisenbug.
> The more logging I added earlier made it less likely to occur. It seems
> more likely to occur on Travis than locally and I made it happen by
> repeatedly running mvn install on Metron-solr (after a mvn install of the
> whole project).
> On Tue, Jan 24, 2017 at 07:59 Casey Stella <cestella@gmail.com> wrote:
>
> > Agreed to both counts. I was able to reproduce it locally, but not in an
> > IDE by the way.
> > On Tue, Jan 24, 2017 at 07:57 Justin Leet <justinjleet@gmail.com> wrote:
> >
> > I definitely agree that this isn't a fluke.
> >
> > Do we have a Jira for this?  If not, I can create one and I would like to
> > propose that part of that ticket is adding logging.  Right now, I'm
> > concerned we don't have enough info from the Travis builds to be able to
> > (easily) debug failure or reproduce locally.
> >
> > Justin
> >
> > On Mon, Jan 23, 2017 at 4:16 PM, Casey Stella <cestella@gmail.com>
> wrote:
> >
> > > One more thing, just for posterity here, it always freezes at 6 records
> > > written to HDFS.  That's the reason I thought it was a flushing issue.
> > >
> > > On Mon, Jan 23, 2017 at 3:38 PM, Casey Stella <cestella@gmail.com>
> > wrote:
> > >
> > > > Ok, so now I'm concerned that this isn't a fluke.  Here's an excerpt
> > from
> > > > the failing logs on travis for my PR with substantially longer
> > timeouts (
> > > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/
> 194575474/log.txt)
> > > >
> > > > Running org.apache.metron.solr.integration.
> SolrIndexingIntegrationTest
> > > > 0 vs 10 vs 0
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Processed target/indexingIntegrationTest/hdfs/
> test/enrichment-null-0-0-
> > > 1485200689038.json
> > > > 10 vs 10 vs 6
> > > > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> 317.056
> > > sec <<< FAILURE!
> > > > test(org.apache.metron.solr.integration.SolrIndexingIntegrationTest)
> > > Time elapsed: 316.949 sec  <<< ERROR!
> > > > java.lang.RuntimeException: Took too long to complete: 300783 >
> 300000
> > > >       at org.apache.metron.integration.ComponentRunner.process(
> > > ComponentRunner.java:131)
> > > >       at org.apache.metron.indexing.integration.
> > > IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
> > > >
> > > >
> > > > I'm getting the impression that this isn't the timeout and we have a
> > > mystery on our hands.  Each of those lines "10 vs 10 vs 6" happen 15
> > > seconds apart.  That line means that it read 10 entries from kafka, 10
> > > entries from the indexed data and 6 entries from HDFS.  It's that 6
> > entries
> > > that is the problem.   Also of note, this does not seem to happen to me
> > > locally AND it's not consistent on Travis.  Given all that I'd say that
> > > it's a problem with the HDFS Writer not getting flushed, but I verified
> > > that it is indeed flushed per message.
> > > >
> > > >
> > > > Anyway, tl;dr we have a mystery unit test bug that isn't
> deterministic
> > > wrt the unit tests and may or may not manifest itself outside of the
> unit
> > > tests.  So, yeah, I'll be looking at it, but would appreciate others
> > taking
> > > a gander too.
> > > >
> > > >
> > > > Casey
> > > >
> > > >
> > > > On Mon, Jan 23, 2017 at 2:09 PM, Casey Stella <cestella@gmail.com>
> > > wrote:
> > > >
> > > >> Yeah, I adjusted the timeout on the indexing integration tests as
> part
> > > of
> > > >> https://github.com/apache/incubator-metron/pull/420 which I'll
> merge
> > in
> > > >> today.
> > > >>
> > > >> On Mon, Jan 23, 2017 at 2:01 PM, Zeolla@GMail.com <zeolla@gmail.com
> >
> > > >> wrote:
> > > >>
> > > >>> Okay, now we have back to back failures, and it looks like it
may
> > have
> > > >>> been
> > > >>> a timeout issue?
> > > >>>  `test(org.apache.metron.solr.integration.
> > > SolrIndexingIntegrationTest):
> > > >>> Took too long to complete: 150582 > 150000`, more details below:
> > > >>>
> > > >>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> > > >>> 166.167 sec <<< FAILURE!
> > > >>>
> > > >>> test(org.apache.metron.solr.integration.
> SolrIndexingIntegrationTest)
> > > >>> Time elapsed: 166.071 sec  <<< ERROR!
> > > >>>
> > > >>> java.lang.RuntimeException: Took too long to complete: 150582
>
> > 150000
> > > >>>
> > > >>>         at org.apache.metron.integration.
> > > ComponentRunner.process(Compon
> > > >>> entRunner.java:131)
> > > >>>
> > > >>>         at org.apache.metron.indexing.integration.
> > > IndexingIntegrationTe
> > > >>> st.test(IndexingIntegrationTest.java:173)
> > > >>>
> > > >>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > Method)
> > > >>>
> > > >>>         at sun.reflect.NativeMethodAccessorImpl.
> > > invoke(NativeMethodAcce
> > > >>> ssorImpl.java:62)
> > > >>>
> > > >>>         at sun.reflect.DelegatingMethodAccessorImpl.
> > > invoke(DelegatingMe
> > > >>> thodAccessorImpl.java:43)
> > > >>>
> > > >>>         at java.lang.reflect.Method.invoke(Method.java:483)
> > > >>>
> > > >>>         at org.junit.runners.model.FrameworkMethod$1.
> > > runReflectiveCall(
> > > >>> FrameworkMethod.java:50)
> > > >>>
> > > >>>         at org.junit.internal.runners.
> model.ReflectiveCallable.run(
> > > Refl
> > > >>> ectiveCallable.java:12)
> > > >>>
> > > >>>         at org.junit.runners.model.FrameworkMethod.
> > > invokeExplosively(Fr
> > > >>> ameworkMethod.java:47)
> > > >>>
> > > >>>         at org.junit.internal.runners.statements.InvokeMethod.
> > > evaluate(
> > > >>> InvokeMethod.java:17)
> > > >>>
> > > >>>         at org.junit.runners.ParentRunner.runLeaf(
> > > ParentRunner.java:325)
> > > >>>
> > > >>>         at org.junit.runners.BlockJUnit4ClassRunner.
> > > runChild(BlockJUnit
> > > >>> 4ClassRunner.java:78)
> > > >>>
> > > >>>         at org.junit.runners.BlockJUnit4ClassRunner.
> > > runChild(BlockJUnit
> > > >>> 4ClassRunner.java:57)
> > > >>>
> > > >>>         at
> > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> > > >>>
> > > >>>         at org.junit.runners.ParentRunner$1.schedule(
> > > ParentRunner.java:
> > > >>> 71)
> > > >>>
> > > >>>         at org.junit.runners.ParentRunner.runChildren(
> > > ParentRunner.java
> > > >>> :288)
> > > >>>
> > > >>>         at org.junit.runners.ParentRunner.access$000(
> > > ParentRunner.java:
> > > >>> 58)
> > > >>>
> > > >>>         at org.junit.runners.ParentRunner$2.evaluate(
> > > ParentRunner.java:
> > > >>> 268)
> > > >>>
> > > >>>         at org.junit.runners.ParentRunner.run(ParentRunner.
> java:363)
> > > >>>
> > > >>>         at org.apache.maven.surefire.
> junit4.JUnit4Provider.execute(
> > > JUni
> > > >>> t4Provider.java:252)
> > > >>>
> > > >>>         at org.apache.maven.surefire.junit4.JUnit4Provider.
> > > executeTestS
> > > >>> et(JUnit4Provider.java:141)
> > > >>>
> > > >>>         at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(
> > > JUnit
> > > >>> 4Provider.java:112)
> > > >>>
> > > >>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > Method)
> > > >>>
> > > >>>         at sun.reflect.NativeMethodAccessorImpl.
> > > invoke(NativeMethodAcce
> > > >>> ssorImpl.java:62)
> > > >>>
> > > >>>         at sun.reflect.DelegatingMethodAccessorImpl.
> > > invoke(DelegatingMe
> > > >>> thodAccessorImpl.java:43)
> > > >>>
> > > >>>         at java.lang.reflect.Method.invoke(Method.java:483)
> > > >>>
> > > >>>         at org.apache.maven.surefire.util.ReflectionUtils.
> > > invokeMethodW
> > > >>> ithArray(ReflectionUtils.java:189)
> > > >>>
> > > >>>         at org.apache.maven.surefire.booter.ProviderFactory$
> > > ProviderPro
> > > >>> xy.invoke(ProviderFactory.java:165)
> > > >>>
> > > >>>         at org.apache.maven.surefire.booter.ProviderFactory.
> > > invokeProvi
> > > >>> der(ProviderFactory.java:85)
> > > >>>
> > > >>>         at org.apache.maven.surefire.booter.ForkedBooter.
> > > runSuitesInPro
> > > >>> cess(ForkedBooter.java:115)
> > > >>>
> > > >>>         at org.apache.maven.surefire.booter.ForkedBooter.main(
> > > ForkedBoo
> > > >>> ter.java:75)
> > > >>>
> > > >>>
> > > >>> Jon
> > > >>>
> > > >>> On Thu, Jan 19, 2017 at 2:49 PM Zeolla@GMail.com <zeolla@gmail.com
> >
> > > >>> wrote:
> > > >>>
> > > >>> > The build has been showing as failing
> > > >>> > <https://github.com/apache/incubator-metron> for a
little while
> > now.
> > > >>> I
> > > >>> > know we recently updated the language around Merge Requirements
> > > >>> > <https://cwiki.apache.org/confluence/pages/viewpage.action?p
> > > >>> ageId=61332235>,
> > > >>> > but if I recall properly our current issue is simply a Travis
CI
> > > >>> overload
> > > >>> > issue.  Is there a way we can update the wiki doc to account
for
> > > >>> situations
> > > >>> > like this?
> > > >>> >
> > > >>> > Jon
> > > >>> > --
> > > >>> >
> > > >>> > Jon
> > > >>> >
> > > >>> > Sent from my mobile device
> > > >>> >
> > > >>> --
> > > >>>
> > > >>> Jon
> > > >>>
> > > >>> Sent from my mobile device
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message