drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [drill] cgivre commented on a change in pull request #2184: DRILL-7874: Drill Fails to Read File Types on S3
Date Tue, 02 Mar 2021 20:41:10 GMT

cgivre commented on a change in pull request #2184:
URL: https://github.com/apache/drill/pull/2184#discussion_r585893757

File path: contrib/format-esri/src/main/java/org/apache/drill/exec/store/esri/ShpBatchReader.java
@@ -125,7 +125,7 @@ public boolean next() {
   private void openFile(FileSchemaNegotiator negotiator) {
     try {
-      fileReaderShp = negotiator.fileSystem().open(split.getPath());
+      fileReaderShp = negotiator.fileSystem().openDecompressedInputStream(split.getPath());

Review comment:
   Thanks for your comment.  The issue that was happening on S3 was that the `inputstream`
that gets passed to the format plugin seemed to be compressed in some way, even if the file
itself was not compressed. 
   This didn't seem to matter for text based file formats, but for formats that read binary
data, such as `pcap`, the stream wasn't being decompressed and the result was that Drill couldn't
parse the file. 
   The `openPossiblyCompressedStream()` method didn't really solve this issue, because you
could end up with a compressed stream that the format plugins couldn't read.  I thought another
approach would be to put this logic in the format plugins themselves, but I couldn't figure
out a way to determine whether the stream was compressed or not after you call the `openPossiblyCompressedStream()`.

   Do you have any suggestions as to how to fix so that we can avoid the OOM?

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:

View raw message