Hi, We are trying to find a solution/workaround to issue: 2016-01-28 16:36:14,367 [Curator-ServiceCache-0] ERROR o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: ForemanException: One more more nodes lost connectivity during query. Identified nodes were [atsqa4-133.qa.lab:31010]. org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: ForemanException: One more more nodes lost connectivity during query. Identified nodes were [atsqa4-133.qa.lab:31010]. at org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:746) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:858) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:790) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:792) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:909) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.access$2700(Foreman.java:110) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman$StateListener.moveToState(Foreman.java:1183) [drill-java-exec-1.5.0-SNAPSHOT.jar:1.5.0-SNAPSHOT] DRILL-4325 ForemanException: One or more nodes lost connectivity during query Any one experienced this issue ? It happens when running query involving many parquet files on a cluster of 200 nodes. Same query on a smaller cluster of 12 nodes runs fine. It is not caused by garbage collection, (checked on both ZK node and the involved drill bit). Negotiated max session timeout is 40 seconds. The sequence seems: - Drill Query begins, using an existing ZK session. - Drill Zk session timeouts - perhaps it was writing something that took too long - Drill attempts to renew session - drill believes that the write operation failed, so it attempts to re-create the zk node, which trigger another exception. We are open to any suggestion. We will report any finding. Thanks Francois