metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: Build failing
Date Thu, 26 Jan 2017 00:17:27 GMT
Ok, I figured out what was going on, I believe, and submitted a PR for it:
https://github.com/apache/incubator-metron/pull/424


On Tue, Jan 24, 2017 at 8:28 AM, Justin Leet <justinjleet@gmail.com> wrote:

> Created a Jira:
> https://issues.apache.org/jira/browse/METRON-672
>
> Feel free to add / correct anything in that ticket.
>
> Justin
>
> On Tue, Jan 24, 2017 at 8:09 AM, Casey Stella <cestella@gmail.com> wrote:
>
> > One thing that I would caution though is that this is likely a heisenbug.
> > The more logging I added earlier made it less likely to occur. It seems
> > more likely to occur on Travis than locally and I made it happen by
> > repeatedly running mvn install on Metron-solr (after a mvn install of the
> > whole project).
> > On Tue, Jan 24, 2017 at 07:59 Casey Stella <cestella@gmail.com> wrote:
> >
> > > Agreed to both counts. I was able to reproduce it locally, but not in
> an
> > > IDE by the way.
> > > On Tue, Jan 24, 2017 at 07:57 Justin Leet <justinjleet@gmail.com>
> wrote:
> > >
> > > I definitely agree that this isn't a fluke.
> > >
> > > Do we have a Jira for this?  If not, I can create one and I would like
> to
> > > propose that part of that ticket is adding logging.  Right now, I'm
> > > concerned we don't have enough info from the Travis builds to be able
> to
> > > (easily) debug failure or reproduce locally.
> > >
> > > Justin
> > >
> > > On Mon, Jan 23, 2017 at 4:16 PM, Casey Stella <cestella@gmail.com>
> > wrote:
> > >
> > > > One more thing, just for posterity here, it always freezes at 6
> records
> > > > written to HDFS.  That's the reason I thought it was a flushing
> issue.
> > > >
> > > > On Mon, Jan 23, 2017 at 3:38 PM, Casey Stella <cestella@gmail.com>
> > > wrote:
> > > >
> > > > > Ok, so now I'm concerned that this isn't a fluke.  Here's an
> excerpt
> > > from
> > > > > the failing logs on travis for my PR with substantially longer
> > > timeouts (
> > > > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/
> > 194575474/log.txt)
> > > > >
> > > > > Running org.apache.metron.solr.integration.
> > SolrIndexingIntegrationTest
> > > > > 0 vs 10 vs 0
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Processed target/indexingIntegrationTest/hdfs/
> > test/enrichment-null-0-0-
> > > > 1485200689038.json
> > > > > 10 vs 10 vs 6
> > > > > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> > 317.056
> > > > sec <<< FAILURE!
> > > > > test(org.apache.metron.solr.integration.
> SolrIndexingIntegrationTest)
> > > > Time elapsed: 316.949 sec  <<< ERROR!
> > > > > java.lang.RuntimeException: Took too long to complete: 300783 >
> > 300000
> > > > >       at org.apache.metron.integration.ComponentRunner.process(
> > > > ComponentRunner.java:131)
> > > > >       at org.apache.metron.indexing.integration.
> > > > IndexingIntegrationTest.test(IndexingIntegrationTest.java:173)
> > > > >
> > > > >
> > > > > I'm getting the impression that this isn't the timeout and we have
> a
> > > > mystery on our hands.  Each of those lines "10 vs 10 vs 6" happen 15
> > > > seconds apart.  That line means that it read 10 entries from kafka,
> 10
> > > > entries from the indexed data and 6 entries from HDFS.  It's that 6
> > > entries
> > > > that is the problem.   Also of note, this does not seem to happen to
> me
> > > > locally AND it's not consistent on Travis.  Given all that I'd say
> that
> > > > it's a problem with the HDFS Writer not getting flushed, but I
> verified
> > > > that it is indeed flushed per message.
> > > > >
> > > > >
> > > > > Anyway, tl;dr we have a mystery unit test bug that isn't
> > deterministic
> > > > wrt the unit tests and may or may not manifest itself outside of the
> > unit
> > > > tests.  So, yeah, I'll be looking at it, but would appreciate others
> > > taking
> > > > a gander too.
> > > > >
> > > > >
> > > > > Casey
> > > > >
> > > > >
> > > > > On Mon, Jan 23, 2017 at 2:09 PM, Casey Stella <cestella@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Yeah, I adjusted the timeout on the indexing integration tests
as
> > part
> > > > of
> > > > >> https://github.com/apache/incubator-metron/pull/420 which I'll
> > merge
> > > in
> > > > >> today.
> > > > >>
> > > > >> On Mon, Jan 23, 2017 at 2:01 PM, Zeolla@GMail.com <
> zeolla@gmail.com
> > >
> > > > >> wrote:
> > > > >>
> > > > >>> Okay, now we have back to back failures, and it looks like
it may
> > > have
> > > > >>> been
> > > > >>> a timeout issue?
> > > > >>>  `test(org.apache.metron.solr.integration.
> > > > SolrIndexingIntegrationTest):
> > > > >>> Took too long to complete: 150582 > 150000`, more details
below:
> > > > >>>
> > > > >>> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> > > > >>> 166.167 sec <<< FAILURE!
> > > > >>>
> > > > >>> test(org.apache.metron.solr.integration.
> > SolrIndexingIntegrationTest)
> > > > >>> Time elapsed: 166.071 sec  <<< ERROR!
> > > > >>>
> > > > >>> java.lang.RuntimeException: Took too long to complete: 150582
>
> > > 150000
> > > > >>>
> > > > >>>         at org.apache.metron.integration.
> > > > ComponentRunner.process(Compon
> > > > >>> entRunner.java:131)
> > > > >>>
> > > > >>>         at org.apache.metron.indexing.integration.
> > > > IndexingIntegrationTe
> > > > >>> st.test(IndexingIntegrationTest.java:173)
> > > > >>>
> > > > >>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > Method)
> > > > >>>
> > > > >>>         at sun.reflect.NativeMethodAccessorImpl.
> > > > invoke(NativeMethodAcce
> > > > >>> ssorImpl.java:62)
> > > > >>>
> > > > >>>         at sun.reflect.DelegatingMethodAccessorImpl.
> > > > invoke(DelegatingMe
> > > > >>> thodAccessorImpl.java:43)
> > > > >>>
> > > > >>>         at java.lang.reflect.Method.invoke(Method.java:483)
> > > > >>>
> > > > >>>         at org.junit.runners.model.FrameworkMethod$1.
> > > > runReflectiveCall(
> > > > >>> FrameworkMethod.java:50)
> > > > >>>
> > > > >>>         at org.junit.internal.runners.
> > model.ReflectiveCallable.run(
> > > > Refl
> > > > >>> ectiveCallable.java:12)
> > > > >>>
> > > > >>>         at org.junit.runners.model.FrameworkMethod.
> > > > invokeExplosively(Fr
> > > > >>> ameworkMethod.java:47)
> > > > >>>
> > > > >>>         at org.junit.internal.runners.statements.InvokeMethod.
> > > > evaluate(
> > > > >>> InvokeMethod.java:17)
> > > > >>>
> > > > >>>         at org.junit.runners.ParentRunner.runLeaf(
> > > > ParentRunner.java:325)
> > > > >>>
> > > > >>>         at org.junit.runners.BlockJUnit4ClassRunner.
> > > > runChild(BlockJUnit
> > > > >>> 4ClassRunner.java:78)
> > > > >>>
> > > > >>>         at org.junit.runners.BlockJUnit4ClassRunner.
> > > > runChild(BlockJUnit
> > > > >>> 4ClassRunner.java:57)
> > > > >>>
> > > > >>>         at
> > > org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> > > > >>>
> > > > >>>         at org.junit.runners.ParentRunner$1.schedule(
> > > > ParentRunner.java:
> > > > >>> 71)
> > > > >>>
> > > > >>>         at org.junit.runners.ParentRunner.runChildren(
> > > > ParentRunner.java
> > > > >>> :288)
> > > > >>>
> > > > >>>         at org.junit.runners.ParentRunner.access$000(
> > > > ParentRunner.java:
> > > > >>> 58)
> > > > >>>
> > > > >>>         at org.junit.runners.ParentRunner$2.evaluate(
> > > > ParentRunner.java:
> > > > >>> 268)
> > > > >>>
> > > > >>>         at org.junit.runners.ParentRunner.run(ParentRunner.
> > java:363)
> > > > >>>
> > > > >>>         at org.apache.maven.surefire.
> > junit4.JUnit4Provider.execute(
> > > > JUni
> > > > >>> t4Provider.java:252)
> > > > >>>
> > > > >>>         at org.apache.maven.surefire.junit4.JUnit4Provider.
> > > > executeTestS
> > > > >>> et(JUnit4Provider.java:141)
> > > > >>>
> > > > >>>         at org.apache.maven.surefire.
> junit4.JUnit4Provider.invoke(
> > > > JUnit
> > > > >>> 4Provider.java:112)
> > > > >>>
> > > > >>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> > > Method)
> > > > >>>
> > > > >>>         at sun.reflect.NativeMethodAccessorImpl.
> > > > invoke(NativeMethodAcce
> > > > >>> ssorImpl.java:62)
> > > > >>>
> > > > >>>         at sun.reflect.DelegatingMethodAccessorImpl.
> > > > invoke(DelegatingMe
> > > > >>> thodAccessorImpl.java:43)
> > > > >>>
> > > > >>>         at java.lang.reflect.Method.invoke(Method.java:483)
> > > > >>>
> > > > >>>         at org.apache.maven.surefire.util.ReflectionUtils.
> > > > invokeMethodW
> > > > >>> ithArray(ReflectionUtils.java:189)
> > > > >>>
> > > > >>>         at org.apache.maven.surefire.booter.ProviderFactory$
> > > > ProviderPro
> > > > >>> xy.invoke(ProviderFactory.java:165)
> > > > >>>
> > > > >>>         at org.apache.maven.surefire.booter.ProviderFactory.
> > > > invokeProvi
> > > > >>> der(ProviderFactory.java:85)
> > > > >>>
> > > > >>>         at org.apache.maven.surefire.booter.ForkedBooter.
> > > > runSuitesInPro
> > > > >>> cess(ForkedBooter.java:115)
> > > > >>>
> > > > >>>         at org.apache.maven.surefire.booter.ForkedBooter.main(
> > > > ForkedBoo
> > > > >>> ter.java:75)
> > > > >>>
> > > > >>>
> > > > >>> Jon
> > > > >>>
> > > > >>> On Thu, Jan 19, 2017 at 2:49 PM Zeolla@GMail.com <
> zeolla@gmail.com
> > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> > The build has been showing as failing
> > > > >>> > <https://github.com/apache/incubator-metron> for
a little
> while
> > > now.
> > > > >>> I
> > > > >>> > know we recently updated the language around Merge Requirements
> > > > >>> > <https://cwiki.apache.org/confluence/pages/viewpage.action?p
> > > > >>> ageId=61332235>,
> > > > >>> > but if I recall properly our current issue is simply
a Travis
> CI
> > > > >>> overload
> > > > >>> > issue.  Is there a way we can update the wiki doc to
account
> for
> > > > >>> situations
> > > > >>> > like this?
> > > > >>> >
> > > > >>> > Jon
> > > > >>> > --
> > > > >>> >
> > > > >>> > Jon
> > > > >>> >
> > > > >>> > Sent from my mobile device
> > > > >>> >
> > > > >>> --
> > > > >>>
> > > > >>> Jon
> > > > >>>
> > > > >>> Sent from my mobile device
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message