drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: ECS parquet files query timing out
Date Sat, 28 Mar 2020 20:53:14 GMT
Hi Navin,

Thank you for the detailed information. Very helpful.

I may be confused about what "ECS" stands for in your case. I had assumed it is the Amazon
Elastic Container Service. However, I'm struggling to understand how that ECS provides an
S3 interface. Is it, instead the Dell EMC Elastic Cloud Storage storage layer from Ipsilon?
[1]


The stack trace shows that the delay/problem occurs when communicating with the S3 endpoint.
Assuming my sources match the version you are using, the problem occurs when Drill tries to
open the Parquet file footer:

  private ParquetMetadata readFooter(Configuration conf, Path path, ParquetReaderConfig readerConfig)
throws IOException {
    // Error is in the following line
    try (ParquetFileReader reader = ParquetFileReader.open(HadoopInputFile.fromPath(path,
      readerConfig.addCountersToConf(conf)), readerConfig.toReadOptions())) {

We can see from the code above, and from the stack trace, that Drill is blissfully ignorant
of the fact that the S3 API is connecting to ECS. That is, Drill does nothing differently
for the ECS S3 case than it does for the Amazon S3 case or the HDFS case. In all cases, it
calls the HDFS client fromPath() function.

Given this, my suspicion is that there is a problem with the Dell ECS implementation of the
S3 API. A previous note suggested that you check this outside of Drill.

1. Use the HDFS client to download a Parquet (or any) file from ECS.

2. Use an S3 client to download the same file from ECS.


Do the above repeatedly in a loop to determine if the operations are stable under load.

There is also a Parquet client tool that lets you inspect Parquet files. [2] I think (but
am not certain) that it uses the HDFS client API as well. Try using that client to inspect
your Parquet files. Again, run the operations in a loop to test load. Does that tool hit the
same issues?

If the problem is somehow related to Dell's implementation of the S3 API, then there is little
Drill can do to fix it. On the other hand, if the Dell implemetation requires certain properties
or settings to work well, then we can figure out how to configure that in HDFS so that Drill
can pick up those settings. Information about Dell's S3 implementation is at [3].

Please let us know if the above suggestions are off the mark; all we have to go on is the
information which you've kindly shared. Perhahs there are other key facts we do not yet know.


Thanks,
- Paul


[1] http://doc.isilon.com/ECS/3.1/DataAccessGuide/index.html#ecs_c_docs_landing_page_content.html

[2] https://github.com/apache/parquet-mr/tree/master/parquet-cli

[3] https://www.emc.com/techpubs/api/ecs/v2-2-0-0/S3ObjectOperations_ba672412ac371bb6cf4e69291344510e_overview.htm


 

    On Saturday, March 28, 2020, 1:39:00 AM PDT, Navin Bhawsar <navin.bhawsar@gmail.com>
wrote:  
 
 Thanks Paul.
To add more details we are comparing drill performance using below two storage options1.dfs
plugin pointing to single node hdfs cluster2. S3 plugin  pointing to ecs bucket ,no hdfs
In both storage we have data stored in parquet files for e.g. in this query we are querying
a directory with 19 parquet files close to 2gb in total same set on s3 and hdfs.
Drillbits are running on 2 unix machines with (6 core,32 gb) each.On one of the unix machine
we have hdfs single node cluster + zookeeper + drillbit running .Other unix machine is running
drill bit.
On Both hdfs and s3 storage we have created parquet metadata file,additionally we have statistics
created for dfs .Based on analysis so far dfs is performing better when compared to s3.Same
query which completes in 2.121s on dfs ,times out on s3.
Looking at plan mostly "parquet row group scan" is taking more time 99 %.Stack trace shows
error " unable to execute http request: Timeout waiting  for connection from (org.apache.drill.common.exceptions.ExecutionSetupException)
java.io.InterruptedIOException: getFileStatus on s3a://test-bucket/TestDir/Test_1.parquet:
 com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection
from pool
    org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.getBatch():261
    org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():42
    org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():36
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():163
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():114
    org.apache.drill.exec.physical.impl.ImplCreator.getExec():90
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():292
    org.apache.drill.common.SelfCleaningRunnable.run():38
    .......():0
  Caused By (java.lang.Exception) getFileStatus on s3a://test-bucket/TestDir/Test_1.parquet:
 com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection
from pool
    org.apache.hadoop.fs.s3a.S3AUtils.translateInterruptedException():352
    org.apache.hadoop.fs.s3a.S3AUtils.translateException():177
    org.apache.hadoop.fs.s3a.S3AUtils.translateException():151
    org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus():2242
    org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus():2204
    org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus():2143
    org.apache.parquet.hadoop.util.HadoopInputFile.fromPath():39
    org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.readFooter():353
    org.apache.drill.exec.store.parquet.AbstractParquetScanBatchCreator.getBatch():149
    org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():42
    org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch():36
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():163
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():141
    org.apache.drill.exec.physical.impl.ImplCreator.getChildren():186
    org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():114
    org.apache.drill.exec.physical.impl.ImplCreator.getExec():90
    org.apache.drill.exec.work.fragment.FragmentExecutor.run():292
    org.apache.drill.common.SelfCleaningRunnable.run():38
    .......():0Thanks & Regards ,Navin 
On Sat, 28 Mar 2020, 09:27 Paul Rogers, <par0328@yahoo.com> wrote:

Hi Navin,

You had mentioned your ECS solution in an earlier note. What are you using to access data
in your container? Is your ECS container running HDFS? Or, do you have some other API?

Do you have Drill running in a container on ECS, or is that were your data is located? It
would be helpful if you could perhaps describe your setup in a bit more detail so we can offer
suggestions about where to look for an issue.

By the way: the query profile is often a good place to start. You'll find them in the Drill
Web Console. Looking at each operator you can see how much memory was used and how long things
took. Specifically, look at the time taken by the scan: is the slowness due to reading the
data, or is some other part of the query taking the time?

When you get the error, what is the stack trace? Is the error coming from some particular
HDFS client? In some particular operation?


Thanks,
- Paul

 

    On Friday, March 27, 2020, 6:59:42 AM PDT, Navin Bhawsar <navin.bhawsar@gmail.com>
wrote:  
 
 Hi,

We are facing performance issue where apache drill query on ecs time out
with below error "ConnectionPoolTimeoutException: Timeout waiting for
connection from pool"

However  same query works fine on hdfs single node with execution time of
2.1 sec.(planning =.483s)

Parquet file size <1.5 GB
Total parquet files scanned = 8( total 19 in directory)
Apache drill version 1.17
JDK 1.8.0_74
Total rows returned from query =71000

There are 2 drillbits running in distributed mode .
13 GB default allocated per drill bit.

Any ideas why ecs performance so bad when compared with hdfs for drill  ?
Please advise if drill provides options to optimize ecs querying .

Please let me know if you need more details.

Thanks & Regards,
Navin
  
  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message