ignite-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vyacheslav Daradur <daradu...@gmail.com>
Subject Re: Nodes which started in separate JVM couldn't stop properly (in tests)
Date Thu, 01 Mar 2018 09:11:23 GMT
Thank you, Dmitry!

I'll join this review soon.

On Thu, Mar 1, 2018 at 12:07 PM, Dmitry Pavlov <dpavlov.spb@gmail.com> wrote:
> Hi Vyacheslav,
>
> I will take a look, but first of all I am going to review
> https://reviews.ignite.apache.org/ignite/review/IGNT-CR-502  - it is impact
> change in testing framework. Hope you also will join to this review .
>
> Sincerely,
> Dmitiry Pavlov
>
>
> чт, 1 мар. 2018 г. в 11:13, Vyacheslav Daradur <daradurvs@gmail.com>:
>>
>> Hi, Dmitry, could you please review it, because you are one of the
>> most experienced people in the testing framework.
>>
>> Please see comment in Jira, because it is in pretty-format there.
>>
>> On Thu, Feb 22, 2018 at 11:56 AM, Vyacheslav Daradur
>> <daradurvs@gmail.com> wrote:
>> > Hi Igniters!
>> >
>> > I have investigated the issue [1] and found that stopping node in
>> > separate JVM may stuck thread or leave system process alive after test
>> > finished.
>> > The main reason is *StopGridTask* that we send from node in local JVM
>> > to node in separate JVM via remote computing.
>> > We send job synchronously to be sure that node will be stopped, but
>> > job calls synchronously *G.stop(igniteInstanceName, cancel))* with
>> > *cancel = false*, that means node must wait to compute jobs before it
>> > goes down what leads to some kind of deadlock. Using of *cancel =
>> > true* would solve the issue but may break some tests’ logic, for this
>> > reason, I've reworked the method’s synchronization logic [2].
>> >
>> > We have not noticed that before because we use only *stopAllGrids()*
>> > in out tests which stop local JVM without waiting for nodes in other
>> > JVMs.
>> > I believe this fix should reduce the number of flaky tests on
>> > TeamCity, especially which fails because of a cluster from the
>> > previous test has not been stopped properly.
>> >
>> > Ci.tests [3] look a bit better than in master.
>> > Please review prepared PR [2] and share your thoughts.
>> >
>> > [1] https://issues.apache.org/jira/browse/IGNITE-5910
>> > [2] https://github.com/apache/ignite/pull/2382
>> > [3] https://ci.ignite.apache.org/viewLog.html?buildId=1105939
>> >
>> >
>> > On Fri, Aug 4, 2017 at 11:41 AM, Vyacheslav Daradur
>> > <daradurvs@gmail.com> wrote:
>> >> Hi Igniters,
>> >>
>> >> Working on my task I found a bug at call the method #stopGrid(name),
>> >> it produced ClassCastException. I created a ticket[1].
>> >>
>> >> After it was fixed[2] I saw that nodes which was started in a separate
>> >> JVM
>> >> could stay in process of operation system.
>> >> It was fixed too, but not sure is it fixed in proper way or not.
>> >>
>> >> Could someone review it?
>> >>
>> >> [1] https://issues.apache.org/jira/browse/IGNITE-5910
>> >> [2] https://github.com/apache/ignite/pull/2382
>> >>
>> >> --
>> >> Best Regards, Vyacheslav D.
>> >
>> >
>> >
>> > --
>> > Best Regards, Vyacheslav D.
>>
>>
>>
>> --
>> Best Regards, Vyacheslav D.



-- 
Best Regards, Vyacheslav D.

Mime
View raw message