drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] cgivre commented on a change in pull request #2189: DRILL-7877 - Streaming REST API Fails to Send Multiple Batches
Date Sun, 14 Mar 2021 12:40:59 GMT

cgivre commented on a change in pull request #2189:
URL: https://github.com/apache/drill/pull/2189#discussion_r593894152



##########
File path: exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/stream/StreamingHttpConnection.java
##########
@@ -104,11 +104,9 @@ public void outputAvailable(OutputStream out) throws IOException {
   public void sendData(RpcOutcomeListener<Ack> listener, QueryDataPackage data) {
     VectorContainer batch = data.batch();
     try {
-      if (batchCount == 0) {
-        batchHolder = new BatchHolder(batch);
-        reader = new PushResultSetReaderImpl(batchHolder);
-        startSignal.await();
-      }
+      batchHolder = new BatchHolder(batch);
+      reader = new PushResultSetReaderImpl(batchHolder);
+      startSignal.await();

Review comment:
       @paul-rogers 
   Thanks for taking a look at this.  When I was stepping through this (full disclosure, I
got my second COVID shot on Friday and was really groggy yesterday), I noticed that the issue
seemed to be on this line:
   https://github.com/apache/drill/blob/4e514c214091b21c334f0d3b2c14c1efed381850/exec/java-exec/src/main/java/org/apache/drill/exec/server/rest/stream/StreamingHttpConnection.java#L112
   
   Let's say that you have a query that returns `n` batches `b1...bn`.   As it was originally
implemented, the first batch would emit the headers with an empty array for `rows`.   So far
so good.  
   
   When you hit the next batch (`b2`), nothing was ever being done with `b2` and most importantly,
`b1` was being sent again to `emitBatch()`.   If you notice, we create a new `VectorContainer`
from the incoming data, but if those lines are skipped, nothing happens with that `VectorContainer`.
  The end result is that `b1` gets emitted `n` times.  Since `b1` has no rows, you get an
empty dataset. 
   
   Like I said, I was pretty groggy yesterday, so I'm only about 78.346% sure that's what's
happening.  
   
   If the concern is an OOM situation, one option might be to change the scope of the `reader`
and `batchHolder` variables so that they are local to that function.  They don't seem to be
used outside of the `sendData` function. 
   
    
   Tagging @Daddykong for additional sanity check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



Mime
View raw message