calcite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Francis Chuang <francischu...@apache.org>
Subject Re: CyclicMetadataException in testPushDownJoinConditionsWithExpandedIsNotDistinctUsingCase
Date Wed, 29 May 2019 12:02:10 GMT
In Go, there's a built in race detector that can be used when running 
tests using `go test`.

There's RacerD [1] from Facebook that can detect races in Java. Perhaps 
this is something that can be looked in to to find the race.

[1] https://fbinfer.com/docs/racerd.html

On 29/05/2019 9:52 pm, Stamatis Zampetakis wrote:
> Good insights, Ruben, Danny!
> 
> Assuming that is indeed a concurrency problem, it will be difficult to
> identify since many parts in Calcite are not thread safe. Not being able to
> reproduce the problem makes the things even worse.
> 
> Given that intermittent test failures occur often on Jenkins, how about
> creating a new branch dedicated to debugging?
> It can be associated with a new Jenkins jobs (that obviously doesn't send
> mails to everybody when there are failures) that can be launched on-demand
> by the person who is looking into the problem.
> 
> On the other hand, we do not really make an effort to have tests that can
> be executed concurrently so another alternative would be to run tests only
> sequentially.
> I am not very fan of this approach since it will rather hide problems than
> solve them.
> 
> 
> 
> 
> 
> 
> On Wed, May 29, 2019 at 1:02 PM Yuzhao Chen <yuzhao.cyz@gmail.com> wrote:
> 
>> Thanks Ruben for your good analysis.
>>
>> What I’m confused is that isn’t the static REL_BUILDER more prone to have
>> concurrency problems ? And the pushed scans(EMP_SCAN and DEPT_SCAN) are all
>> nodes(immutable), how could this be a problem ?
>>
>> Best,
>> Danny Chan
>> 在 2019年5月29日 +0800 PM5:37,Ruben Q L <rubenql@gmail.com>,写道:
>>> I'm checking the commit [1] and I see something strange in
>> RelOptUtilTest.
>>> Maybe I'm wrong and it is nothing, but just in case it may help:
>>>
>>> With the latest modification, it seems that we have two RelBuilder(s) in
>>> place:
>>> - A static one that is created ad-hoc on a static block to generate the
>>> EMP_SCAN and DEPT_SCAN RelNodes [2]
>>> - An instance one to be used in the tests, that is initialized on
>>> the @Before public void setUp() method [3]
>>>
>>> Before this commit, the EMP_SCAN / DEPT_SCAN were only used to read their
>>> rowTypes to test some join auxiliary methods. But the new
>>> tests testPushDownJoinConditions* actually build a plan and push these
>>> scans into the RelBuilder to be tested [4] (which is a different one than
>>> the static RelBuider that created the scans).
>>> Maybe this is no problem generally, but it can potentially be under
>> certain
>>> circumstances?, which would explain the randomness of the issue.
>>> Could this explain the exception?
>>>
>>> [1]
>>>
>> https://github.com/apache/calcite/commit/82e7d4e760cb203d31956c55e38e0fdd56119d58
>>>
>>> [2]
>>>
>> https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L71
>>> [3]
>>>
>> https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L92
>>> [4]
>>>
>> https://github.com/apache/calcite/blob/ac40d6951bc8c475ca6804be6d878107cc2ebb13/core/src/test/java/org/apache/calcite/plan/RelOptUtilTest.java#L292
>>>
>>>
>>>
>>> Le mer. 29 mai 2019 à 02:20, Julian Hyde <jhyde@apache.org> a écrit :
>>>
>>>> It’s a tough call. It is probable that the problem existed already and
>> the
>>>> change merely surfaced it.
>>>>
>>>>> On May 28, 2019, at 5:17 PM, Stamatis Zampetakis <zabetak@gmail.com>
>>>> wrote:
>>>>>
>>>>> It is not the only test that is failing after commit [1] but all the
>> new
>>>>> tests that were added.
>>>>>
>>>>> I've seen the problem on Jenkins on all JDKS but I cannot reproduce
>> it
>>>>> locally.
>>>>> I guess we have to do with a race condition most likely due to the
>>>>> concurrent execution of tests with surefire.
>>>>>
>>>>> Should we revert the commit till we find a solution?
>>>>>
>>>>> [1]
>>>>>
>>>>
>> https://github.com/apache/calcite/commit/82e7d4e760cb203d31956c55e38e0fdd56119d58
>>>>>
>>>>> On Tue, May 28, 2019 at 7:57 PM Julian Hyde <jhyde@apache.org>
>> wrote:
>>>>>
>>>>>> I have seen this intermittent failure 3 times in the last week:
>>>>>>
>>>>>> [INFO] Running org.apache.calcite.plan.RelOptUtilTest
>>>>>> [ERROR] Tests run: 11, Failures: 0, Errors: 1, Skipped: 0, Time
>> elapsed:
>>>>>> 0.411 s <<< FAILURE! - in org.apache.calcite.plan.RelOptUtilTest
>>>>>> [ERROR]
>>>>>>
>>>>
>> testPushDownJoinConditionsWithExpandedIsNotDistinctUsingCase(org.apache.calcite.plan.RelOptUtilTest)
>>>>>> Time elapsed: 0.349 s <<< ERROR!
>>>>>> org.apache.calcite.rel.metadata.CyclicMetadataException
>>>>>> at
>>>>>>
>>>>
>> org.apache.calcite.plan.RelOptUtilTest.testPushDownJoinConditionsWithExpandedIsNotDistinctUsingCase(RelOptUtilTest.java:445)
>>>>>>
>>>>>> I have seen it on Oracle JDK 12 and OpenJDK 10. The test was only
>> added
>>>> on
>>>>>> May 22 so I assume that it will continue to fail intermittently
>> until
>>>> we do
>>>>>> something.
>>>>>>
>>>>>> Anyone have any ideas?
>>>>>>
>>>>>> Laurent, As you added the test can you please look into it?
>>>>>>
>>>>>> Julian
>>>>>>
>>>>>>
>>>>
>>>>
>>
> 

Mime
View raw message